Managing Dynamic Metadata and Context

(1)

Managing Dynamic

Metadata and Contex

Mehmet S. Aktas

(2)

2

Context as Service Metadata

p

Context can be

n

interaction-independent

pslowly varying, quasi-static service metadata

n

interaction-dependent

pdynamically generated metadata as result of

interaction of services

pinformation associated to a single service, or a

session (service activity) or both

p

Dynamic Grid/Web Service Collections

n

assembled to support a specific task

n

can be workflow and audio/video collaborative

sessions

n

generate metadata and have limited life-time

(3)

Motivating Cases

pMultimedia Collaboration domain

nGlobal Multimedia Collaboration System- Global MMCS

provides A/V conferencing system.

ncollaborative A/V sessions with varying types of metadata such

as real-time metadata describing audio/video streams

ncharacteristics: widely distributed services, metadata of events

(archival data), mostly read-only

pWorkflow-style applications in GIS/Sensor Grids

nPattern Informatics (PI) is an earthquake forecasting

system.

nsensor grid data services generates events when a certain

magnitude of event (such as fault displacement) occurs

nfiring off various services: filtering, analyzing raw data,

generating images, maps

ncharacteristics: any number of widely distributed services can

(4)

4

1

WMS GUI WFS

http://..../..../..txt HP Search Data Filter PI Code Data Filter http://..../..../tmp.xml Context Information Service 2 5,6,7 8 4 3,9 <context xsd:type="ContextType"timeout=“100"> <context-service>http://.../WMS</ context-service> <activity-list mustUnderstand="true" mustPropagate="true"> <service>http://.../WMS</service> <service>http://.../HPSearch</service> </activity-list> </context> session <context xsd:type="ContextType"timeout=“100"> <context-service>http://.../HPSearch</ context-service> <parent-context>http://../abcdef:012345<parent-context/> <content> profile information related WMS </content>

</context>

user profile

<context xsd:type="ContextType"timeout=“100">

<context-service>http://.../HPSearch</ context-service> <parent-context>http://../abcdef:012345<parent-context/> <content> shared data for HPSearch activity </content>

<activity-list mustUnderstand="true" mustPropagate="true"> <service>http://.../DataFilter1</service> <service>http://.../PICode</service> <service>http://.../DataFilter2</service> </activity-list> </context> activity <context xsd:type="ContextType"timeout=“100"> <context-id>http://../abcdef:012345<context-id/> <context-service>http://.../HPSearch</ context-service> <content>http://danube.ucs.indiana.edu:8080\x.xml</content> </context> shared state

<?xml version="1.0" encoding="UTF-8"?>

<soap:Envelope xmlns:soap="http://www.w3..."> <soap:Header encodingStyle=“WSCTX URL"

mustUnderstand="true">

<context xmlns=“ctxt schema“ timeout="100"> <context-id>http..</context-id>

<context-service> http.. </context-service> <context-manager> http.. </context-service> <activity-list

mustUnderstand="true" mustPropagate="true"> <p-service>http://../WMS</p-service>

<p-service>http://../HPSearch</p-service> </activity-list> </context> </soap:Header> ... SOAP header for Context

•session associated dynamic metadata

•user profile

•activity associated dynamic metadata

•service associated dynamically generated metadata What are the examples of dynamically generated

metadata in a real-life example?

3,4: WMS starts a session, invokes HPSearch to run workflow script for PI Code with a session id

5,6,7:HPSearch runs the workflow script and generates output file in GML format (& PDF Format) as result

8:HPSearch writes the URI of the of the output file into Context

9:WMS polls the information from Context Service

10: WMS retrieves the generated output file by workflow script and generates a map

<context xsd:type="ContextType"timeout=“100">

<context-service>http://.../HPSearch</ context-service>

<content> HPSearch associated additional data generated during execution of workflow. </content>

</context>

(5)

Practical Problem

p

We need a Grid Information Service for managing

all information associated with services in Gaggles

for;

n

correlating activities of widely distributed services

pworkflow style applications

n

management of events in multimedia

collaboration

pproviding information to enable

§real-time replay/playback

§session failure recovery

n

enabling uniform query capabilities

p“Give me list of services satisfying C:{a,b,c..} QoS

(6)

6

Motivations

p

Managing small scale highly dynamic

metadata as in dynamic Grid/Web Service

collections

p

Performance limitations in point-to-point

based service communication approaches for

managing stateful service information

p

Lack of support for uniform hybrid query

capabilities to both static and dynamic context

information

p

Lack of support for adaptation to

instantaneous changes in client demands

p

Lack of support for distributed session

(7)

Research Issues I

p

Performance

n

Efficient mediator metadata strategies for service

communication: high performance and persistency

p

Efficient access request distribution

n

How to choose a replica server to best serve a client

request?

n

How to provide adaptation to instantaneous changes

in client demands?

p

Fault-tolerance

n

High availability of information

(8)

8

Research Issues II

p

Consistency

n

Provide consistency across the copies of a replica

p

Flexibility

n

Accommodating broad range of application

domains, such as read-dominated, read/write

dominated

p

Interoperability

n

Being compatible with wide range of applications

n

Providing data models and programming interfaces

pto perform hybrid queries over all service metadata pto enable real-time replay/playback or session

(9)

Proposed System

Hybrid WS-Context Service

p

Fault tolerant and high performance Grid

Information Service

nCaching module

nPublish/Subscribe for

fault tolerance, distribution, consistency enforcement

nDatabase backend and Extended UDDI Registry

p

WS-I compatible uniform programming

interface

nSpecification with abstract data models and

programming interface which combines WS-Context and UDDI in one hybrid service to manage service metadata

nHybrid functions operate on both metadata spaces nExtended WS-Context functions operate on session

metadata

nExtended UDDI functions operate on

(10)

10

Distributed HYBRID Grid Information Services

Subscriber Publisher

Replica Server-2 Replica Server-N

Topic Based Publish-Subscribe Messaging System HTTP(S) WSDL Client WSDL Client WSDL WSDL HYBRID

(11)

Detailed architecture of the system

Client WSD L

HTTP(S )

Ext-UDDI WS-Context

Access

WSDL

JDBC Handlers

Expeditor Querying

Publishingand

Storage Sequencer

(12)

12

Key Design Features

p

External Metadata Service

nExtended UDDI Service for handling

interaction-independent metadata

p

Cache

nIntegrated Cache for all service metadata

p

Access

nRedirecting client request to an appropriate replica

server

p

Storage

nReplicating data on an appropriate replica server

p

Consistency enforcement

(13)

Extended UDDI XML Metadata

Service

p

An extended UDDI XML Metadata Service

nAlternative to OGC Web Registry Services

p

It supports different types of metadata

nGIS Metadata Catalog (functional metadata) nUser-defined metadata ((name, value) pairs)

p

It provides unique capabilities

nUp-to-date service registry information (leasing) nDynamic aggregation of geospatial services

p

It enables advanced query capabilities

nGeo-spatial queries

(14)

14

TupleSpaces Paradigm and JavaSpaces

p

TupleSpaces [Gelernter-99]

na data-centric asynchronous communication paradigm ncommunication units are tuples (data structure)

p

JavaSpaces [Sun Microsystems]

njava based object oriented implementation nspaces are transactional secure

pmutual exclusive access to objects

nspaces are persistent

ptemporal, spatial uncoupling

nspaces are associative

(15)

Publish/Subscribe Paradigm and

NaradaBrokering

p

Publish-Subscribe communication

paradigm

nMessage based asynchronous communication

nParticipants are decoupled both in space and in time

p

Open source NaradaBrokering software

ntopic based publish/subscribe messaging system nruns on a network of cooperating broker nodes. nprovides support for variety of QoSs, such as low

(16)

16

Caching Strategy

pIntegrated caching capability for both UDDI-type and

WS-Context-type metadata

nlight-weight implementation of JavaSpaces

ndata sharing, associative lookup, and persistency

nboth WS-Context-type and common UDDI-type standard

operations

pThe system stores all keys and modest size values in

memory, while big size values are stored in the database.

nWe assume that today’s servers are capable of holding such

small size metadata in cache.

nAll modest-size metadata accesses happen in memory

pWS-Context type metadata is backed-up into MySQL

(17)

Performance Model and

Measurements

Average±error (ms)

Stddev

(ms)

Hybrid-WS-Context

Inquiry

12.29±0.02

0.48 Extended UDDI Inquiry

17.68±0.06

0.84

P4, 3.4GHz, 1GB memory,

Java SDK 1.4.2, both client and services on the same machine

Simulation Parameters

Metadata size 1.7 KB

(18)

18

Hybrid WS-Context Caching Approac

Persistency investigation

pThe figure shows the average

execution time for varying backup frequency.

pThe system shows a stable

performance until after the backup frequency is every 10 seconds.

(19)

19

Hybrid WS-Context Caching

Approach

Performance investigation

p% 49 performance increase in

inquiry % 53 performance gain in publication functions compared to database solution.

pSystem processing overhead is less

than 1 milliseconds.

Simulation parameters Backup

(20)

20

Hybrid WS-Context Caching Approac

Message rate scalability investigation

pThis figure shows the system

behavior under increasing message rates.

pThe system scales up to 940 inquiry

messages/second and 480 publication messages/second.

(21)

21

Hybrid WS-Context Caching Approac

Message size scalability investigation

behavior under increasing message sizes.

pThe system performs well for small

size context. Performance remains

same between 100Byte and 10KBytes

frequency every 10seconds

Registry size 5000 metadata Observation 200

behavior under increasing message sizes between 10 KB and 100 KB.

pThe system spends an additional ~7

(22)

22

Access: Request Distribution

p

Pub-sub system based message distribution

p

Broadcast-based request dissemination based

on a hashing scheme

nKeys are hashed to values (topics) that runs from 1 to

1000

nEach replica holder subscribes to topics (the hash

values) of the keys they have

nEach access request is broadcast on the topic

correspond to the key.

nReplica holders unicast a response with a copy of the

context under demand

p

Advantages

ndoes not flood the network with access request

messages

(23)

23

Access Distribution Experimen

Test Methodology

T

1 T2 T3

Time = T1 + T2 + T3

Simulation parameters

Backup frequency every 10 seconds pThe test consists of a

NaradaBrokering server and two hybrid

WS-Context instances for access request

distribution.

pWe determine the time

for avg. cost end-to-end metadata access.

pWe run the system

for 25000 observations.

pGridfarm and

(24)

24

Distribution experiment result

pThe figure shows average results for every 1000

observation. We have 25000 continuous observations.

pThe average transfer time shows the continuous access

distribution operation does not degrade the performance.

pThe figure shows the time required for various activities

of access request distribution.

pThe average overhead of distribution using the pub-sub

(25)

Optimizing Performance

Dynamic migration/replication

p

Dynamic migration/replication

nA methodology for creating temporary copies of a

context in the proximity of their requestors.

nAutonomous decisions

preplication decision belongs to the server

p

Algorithm based on [Rabinovich et al, 1999]

nThe system keeps the popularity (# of access requests)

record for each copy and flush it on regular time intervals

nThe system checks local data every so often for dynamic

migration or replication

nUnpopular server-initiated copies are deleted nPopular copies are moved where they wanted

(26)

26

T

1 T2 T3

Time = T1 + T2 + T3

Simulation parameters

message size / message rate 2.7 Kbytes / 10 msg/sec replication decision frequency every 100 seconds

deletion threshold 0.03 request/second replication threshold 0.18 request/second

registry size 1000 metadata in Indianapolis

pThe test consists of a

NaradaBrokering server and two hybrid

WS-Context instances for access request

distribution.

pWe determine the time

for mean end-to-end metadata access.

pWe run the system for

app. 45 minutes on Gridfarm and complexity

machines.

(27)

p

The figure shows average results for every

100 seconds.

p

The decrease in average latency shows that

(28)

28

Storage: Replica content placement

pPub-sub system for replica content placement

pEach node keeps a Replica Server Map

nThe new coming node sends a multicast probe message when

it joins a network

nEach network node responds with a unicast message to make

themselves discoverable

pSelection of Replica Server(s) for content placement

nSelect a node based on proximity weighting factor

pSending storage request to selected replica servers

n1st step: initiator unicasts storage request to each selected

replica server

n2nd step: recipient server stores the context and becomes

subscriber to the topic of that context

(29)

29

Fault-tolerance experiment

Testing Setup

Backup frequency every 10 seconds

pThe test system consists of

a NaradaBrokering server(s) and four hybrid WS-Context instances separated with significant network

instances.

pWe determine the time for

average end-to-end replica content creation.

pWe run the system

continuously for 25000 observations.

pGridfarm and Teragrid

(30)

30

Fault-tolerance experiment result

observation. The system was continuously tested for 25000 observations.

pThe results indicate the continuous operation does not

degrade the performance.

pThe figure shows the results gathered from fault-tolerance

experiments data.

pOverhead of replica creation increases in the order of

(31)

Consistency enforcement

pPub-sub system for enforcing consistency pPrimary-copy approach

nUpdates of a same data are carried out at a single server nUse of NTP protocol based synchronized timestamps to

impose an order to write operations on the same data

pUpdate distribution

n1st step: An update request is forwarded (unicast) to the

primary copy holder by the initiator

n2nd step: The primary-copy holder performs the update

request and returns an acknowledgement

pUpdate propagation

nThe primary-copy pushes (broadcasts) updates of a context, non the topic (hash value) correspond to the key of the

context,

nif the primary-copy realizes that there exist a stale copy in

(32)

32

Consistency Enforcement Experimen

Test Methodology

T

1 T2 T3

Time = T1 + T2 + T3

Backup frequency every 10 seconds Message size 2.7 Kbytes

pThe test system consists

of a NaradaBrokering server and two hybrid WS-Context instances for access request

distribution.

pWe determine the avg.

time required for

enforcing consistency.

pWe run the system

for 25000 observations.

pGridfarm and

(33)

Consistency Enforcement Test Result

observation. We have 25000 continuous observations.

pThe average transfer time shows the continuous

operation does not degrade the performance.

pThe figure shows the results gathered from consistency

experiments data.

pThe results indicate that the overhead of consistency

(34)

34

Comparison of Experiment Results

pThe figure shows the results gathered from the distribution,

fault-tolerance and consistency experiments data.

pThe results indicate that the overhead of integrating JavaSpaces

(35)

Contribution

pWe have shown that communication among services can

be achieved with efficient mediator metadata strategies.

nEfficient mediator services allow us to perform collective

operations such as queries on subsets of all available metadata in service conversation.

pWe have shown that efficient decentralized metadata

system can be built by integrating JavaSpaces with Publish/Subscribe paradigm.

nFault-tolerance, distribution and consistency can be succeeded

with few milliseconds system processing overhead.

pWe have shown that adaptation to instantaneous

changes in client demands can be achieved in decentralized metadata management.

pWe have introduced data models and programming