PowerPoint2007

(1)

Managing Dynamic

Metadata and Context

Mehmet S. Aktas

(2)

2 of 34

Context as Service Metadata

p

Context can be

n

interaction-independent

p slowly varying, quasi-static service metadata

n

interaction-dependent

p dynamically generated metadata as result of

interaction of services

p information associated to a single service, or a

session (service activity) or both

p

Dynamic Grid/Web Service Collections

n

assembled to support a specific task

n

can be workflow and audio/video collaborative

sessions

n

generate metadata and have limited life-time

(3)

Motivating Cases

p Multimedia Collaboration domain

n Global Multimedia Collaboration System- Global MMCS

provides A/V conferencing system.

n collaborative A/V sessions with varying types of metadata such

as real-time metadata describing audio/video streams

n characteristics: widely distributed services, metadata of events (archival data), mostly read-only

p Workflow-style applications in GIS/Sensor Grids

n Pattern Informatics (PI) is an earthquake forecasting system.

n sensor grid data services generates events when a certain

magnitude of event (such as fault displacement) occurs n firing off various services: filtering, analyzing raw data,

generating images, maps

(4)

4 of 34

1

WMS GUI WFS

http://..../..../..txt HP Search Data Filter PI Code Data Filter http://..../..../tmp.xml Context Information Service 2 5,6,7 8 4 3,9 <context xsd:type="ContextType"timeout=“100"> <context-service>http://.../WMS</ context-service> <activity-list mustUnderstand="true" mustPropagate="true"> <service>http://.../WMS</service> <service>http://.../HPSearch</service> </activity-list> </context> session <context xsd:type="ContextType"timeout=“100"> <context-service>http://.../HPSearch</ context-service> <parent-context>http://../abcdef:012345<parent-context/> <content> profile information related WMS </content>

</context>

user profile

<context xsd:type="ContextType"timeout=“100">

<context-service>http://.../HPSearch</ context-service> <parent-context>http://../abcdef:012345<parent-context/> <content> shared data for HPSearch activity </content> <activity-list mustUnderstand="true" mustPropagate="true">

<service>http://.../DataFilter1</service> <service>http://.../PICode</service> <service>http://.../DataFilter2</service> </activity-list> </context> activity <context xsd:type="ContextType"timeout=“100"> <context-id>http://../abcdef:012345<context-id/> <context-service>http://.../HPSearch</ context-service> <content>http://danube.ucs.indiana.edu:8080\x.xml</content> </context> shared state

<?xml version="1.0" encoding="UTF-8"?>

<soap:Envelope xmlns:soap="http://www.w3..."> <soap:Header encodingStyle=“WSCTX URL"

mustUnderstand="true">

<context xmlns=“ctxt schema“ timeout="100"> <context-id>http..</context-id>

<context-service> http.. </context-service> <context-manager> http.. </context-service> <activity-list

mustUnderstand="true" mustPropagate="true"> <p-service>http://../WMS</p-service>

<p-service>http://../HPSearch</p-service> </activity-list> </context> </soap:Header> ... SOAP header for Context

1. session associated dynamic metadata 2. user profile

3. activity associated dynamic metadata

4. service associated dynamically generated metadata What are the examples of dynamically generated

metadata in a real-life example?

3,4: WMS starts a session, invokes HPSearch to run workflow script for PI Code with a session id

5,6,7:HPSearch runs the workflow script and generates output file in GML format (& PDF Format) as result

8:HPSearch writes the URI of the of the output file into Context

9:WMS polls the information from Context Service

10: WMS retrieves the generated output file by workflow script and generates a map

<context xsd:type="ContextType"timeout=“100">

<context-service>http://.../HPSearch</ context-service> <content> HPSearch associated additional data generated during execution of workflow. </content>

</context>

(5)

Practical Problem

p

We need a Grid Information Service for managing all

information associated with services in Gaggles for;

n

correlating activities of widely distributed services

p workflow style applications

n

management of events in multimedia collaboration

p providing information to enable

§ real-time replay/playback

§ session failure recovery

n

enabling uniform query capabilities

p “Give me list of services satisfying C:{a,b,c..} QoS

(6)

6 of 34

Motivations

p

Managing small scale highly dynamic metadata

as in dynamic Grid/Web Service collections

p

Performance limitations in point-to-point based

service communication approaches for

managing stateful service information

p

Lack of support for uniform hybrid query

capabilities to both static and dynamic context

information

p

Lack of support for adaptation to instantaneous

changes in client demands

p

Lack of support for distributed session

(7)

Research Issues I

p

Performance

n

Efficient mediator metadata strategies for service

communication: high performance and persistency

p

Efficient access request distribution

n

How to choose a replica server to best serve a client

request?

n

How to provide adaptation to instantaneous changes

in client demands?

p

Fault-tolerance

n

High availability of information

(8)

8 of 34

Research Issues II

p

Consistency

n

Provide consistency across the copies of a replica

p

Flexibility

n

Accommodating broad range of application domains,

such as read-dominated, read/write dominated

p

Interoperability

n

Being compatible with wide range of applications

n

Providing data models and programming interfaces

p to perform hybrid queries over all service metadata p to enable real-time replay/playback or session

(9)

Proposed System

:

Hybrid WS-Context Service

p

Fault tolerant and high performance Grid

Information Service

n Caching module

n Publish/Subscribe for

fault tolerance, distribution, consistency enforcement

n Database backend and Extended UDDI Registry

p

WS-I compatible uniform programming

interface

n Specification with abstract data models and

programming interface which combines WS-Context and UDDI in one hybrid service to manage service metadata

n Hybrid functions operate on both metadata spaces n Extended WS-Context functions operate on session

metadata

n Extended UDDI functions operate on

(10)

10 of 34

Distributed HYBRID Grid Information Services

Subscriber

Publisher

Replica Server-2 Replica Server-N

Topic Based Publish-Subscribe Messaging System

HTTP(S)

WSDL Client

WSDL WSDL

HYBRID

Grid Information Service (GIS)

Extended UDDI

WSDL

JDBC

Replica Server-1

WS Context

Extended UDDI

WSDL

HYBRID GIS

WS

Context Extended_UDDI

WSDL

HYBRID GIS

(11)

Detailed architecture of the system

Client WSDL

HTTP(S)

Ext-UDDI WS-Context

Access

WSDL

JDBC Handlers

Expeditor Querying

Publishingand

Storage Sequencer

(12)

12 of 34

Key Design Features

p

External Metadata Service

n Extended UDDI Service for handling

interaction-independent metadata

p

Cache

n Integrated Cache for all service metadata

p

Access

n Redirecting client request to an appropriate replica server

p

Storage

n Replicating data on an appropriate replica server

p

Consistency enforcement

(13)

Extended UDDI XML Metadata

Service

p

An extended UDDI XML Metadata Service

n Alternative to OGC Web Registry Services

p

It supports different types of metadata

n GIS Metadata Catalog (functional metadata) n User-defined metadata ((name, value) pairs)

p

It provides unique capabilities

n Up-to-date service registry information (leasing) n Dynamic aggregation of geospatial services

p

It enables advanced query capabilities

n Geo-spatial queries

(14)

14 of 34

TupleSpaces Paradigm and JavaSpaces

p

TupleSpaces [Gelernter-99]

n a data-centric asynchronous communication paradigm n communication units are tuples (data structure)

p

JavaSpaces [Sun Microsystems]

n java based object oriented implementation n spaces are transactional secure

p mutual exclusive access to objects n spaces are persistent

p temporal, spatial uncoupling n spaces are associative

(15)

Publish/Subscribe Paradigm and

NaradaBrokering

p

Publish-Subscribe communication paradigm

n Message based asynchronous communication

n Participants are decoupled both in space and in time

p

Open source NaradaBrokering software

n topic based publish/subscribe messaging system n runs on a network of cooperating broker nodes. n provides support for variety of QoSs, such as low

(16)

16 of 34

Caching Strategy

p Integrated caching capability for both UDDI-type and

WS-Context-type metadata

n light-weight implementation of JavaSpaces

n data sharing, associative lookup, and persistency

n both WS-Context-type and common UDDI-type standard

operations

p The system stores all keys and modest size values in

memory, while big size values are stored in the database.

n We assume that today’s servers are capable of holding such

small size metadata in cache.

n All modest-size metadata accesses happen in memory

p WS-Context type metadata is backed-up into MySQL

(17)

Performance Model and

Measurements

Average±error (ms)

Stddev (ms)

Hybrid-WS-Context Inquiry

12.29±0.02

0.48 Extended UDDI Inquiry

17.68±0.06

0.84

P4, 3.4GHz, 1GB memory,

Java SDK 1.4.2, both client and services on the same machine

Simulation Parameters

Metadata size 1.7 KB

(18)

18 of 34

Hybrid WS-Context Caching Approach

Persistency investigation

p The figure shows the average execution

time for varying backup frequency.

p The system shows a stable performance

until after the backup frequency is every 10 seconds.

Simulation parameters

Metadata size 1.7 Kbytes Observation 200

Backup-time interval (logaritmic scale)

10 100 1000 10000 100000 1000000

Ti

me

(msec)

1 2 3 4 5 6 7 8 9 10 11 12

Round Trip Chart for WS-Context Standard Operations for varying backup-interval times

(19)

19 of 34

Hybrid WS-Context Caching

Approach

Performance investigation

p % 49 performance increase in

inquiry % 53 performance gain in publication functions compared to database solution.

p System processing overhead is less

than 1 milliseconds.

Backup

frequency every 10seconds Metadata size 1.7 Kbytes Registry size 5000 metadata

Repeated Test Cases

1 2 3 4 5

Ti me (msec) 0 2 4 6 8 10 12 14 16

18 Round Trip Time Chart for WS-Context Publication Requests

Average - Echo service

Average - memory access

Average - dabase access

STDev - Echo service

STDev - database access

STDev - memory access

Repeated Test Cases

1 2 3 4 5

Ti me (msec) 0 2 4 6 8 10 12 14

16 Round Trip Time Chart for WS-Context Inquiry Requests

Average - Echo service

Average - dabase access

STDev - Echo Service

(20)

20 of 34

Hybrid WS-Context Caching

Approach

Message rate scalability investigation

p This figure shows the system behavior

under increasing message rates.

p The system scales up to 940 inquiry

messages/second and 480 publication messages/second.

Backup

frequency every 10seconds Metadata size 1.7 Kbytes Registry size 100 metadata

message processing rate (message/per second)

100 300 500 700 900 1100

avg

time

(ms)

per

message

0 10 20 30 40 50 60 70

(21)

21 of 34

Hybrid WS-Context Caching

Approach

Message size scalability investigation

under increasing message sizes.

p The system performs well for small

size context. Performance remains

same between 100Byte and 10KBytes context payloads.

Backup

frequency every 10seconds

Registry size 5000 metadata Observation 200

context payload size (KB)

10 20 30 40 50 60 70 80 90 100 110

time (milliseconds) 0 5 10 15 20 25 30 35 40

45 Round Trip Time Chart for WS-Context Publication Operation

Average - Echo Service

STDev - Echo Service

Average - database access

STDev - database access

context payload size (KB) (logarithmic scale)

0.1 1.0 10.0 100.0

avg round trip time (milliseconds) 0 5 10 15 20 25 30

Round Trip Time Chart for WS-Context Standard Operations

Average - publication

STDev -publication

Average - inquiry

STDev - Inquiry

under increasing message sizes between 10 KB and 100 KB.

p The system spends an additional ~7

(22)

22 of 34

Access: Request Distribution

p

Pub-sub system based message distribution

p

Broadcast-based request dissemination based on

a hashing scheme

n Keys are hashed to values (topics) that runs from 1 to

1000

n Each replica holder subscribes to topics (the hash

values) of the keys they have

n Each access request is broadcast on the topic

correspond to the key.

n Replica holders unicast a response with a copy of the

context under demand

p

Advantages

n does not flood the network with access request

messages

(23)

23 of 34

Access Distribution Experiment

Test Methodology

T1 T2 T3

Time = T1 + T2 + T3

Backup frequency every 10 seconds

p The test consists of a

NaradaBrokering server and two hybrid

WS-Context instances for access request

distribution.

p We determine the time

for avg. cost end-to-end metadata access.

p We run the system

for 25000 observations.

p Gridfarm and

(24)

24 of 34

Distribution experiment result

p The figure shows average results for every 1000

observation. We have 25000 continuous observations.

p The average transfer time shows the continuous access

distribution operation does not degrade the performance.

bloomington-indianapolis bloomington-tallahassee bloomington-san diego

Time (ms) 0 10 20 30 40 50 60 70

overhead of distribution when using one intermediary broker overhead of distribution when using two intermediary brokers latency

p The figure shows the time required for various activities of

access request distribution.

p The average overhead of distribution using the pub-sub

system remains the same regardless of the network distances between nodes.

Every 1000 observations

0 5 10 15 20 25

Ti me (ms) 0 1 2 3 4 5 6 7

8 Bloomington - Indianapolis Access Distribution Chart Average - Latency STDev - Latency Average - One Broker STDev - One Broker Average - Two Brokers STDev - Two Brokers

0 5 10 15 20 25

Time (ms) 0 5 10 15 20 25 30 35 40

45 Bloomington, IN - Tallahassee, Florida Distribution Chart Average - Latency STDev - Latency Average - One Broker STDev - One Broker Average - Two Brokers STDev - Two Brokers

Every 1000 observations

0 5 10 15 20 25

Time (ms) 0 10 20 30 40 50 60 70 80

Bloomington - San Diego Distribution Chart

Average - Latency

STDev - Latency

Average - One Broker

STDev - One Broker

Average - Two Brokers

(25)

Optimizing Performance:

Dynamic migration/replication

p

Dynamic migration/replication

n A methodology for creating temporary copies of a context

in the proximity of their requestors.

n Autonomous decisions

p replication decision belongs to the server

p

Algorithm based on [Rabinovich et al, 1999]

n The system keeps the popularity (# of access requests)

record for each copy and flush it on regular time intervals

n The system checks local data every so often for dynamic

migration or replication

n Unpopular server-initiated copies are deleted n Popular copies are moved where they wanted

(26)

26 of 34

T1 T2 T3

Time = T1 + T2 + T3

Simulation parameters

message size / message rate 2.7 Kbytes / 10 msg/sec replication decision frequency every 100 seconds

deletion threshold 0.03 request/second replication threshold 0.18 request/second

registry size 1000 metadata in Indianapolis p The test consists of a

NaradaBrokering server and two hybrid

WS-Context instances for access request

distribution.

p We determine the time

for mean end-to-end metadata access.

p We run the system for

app. 45 minutes on Gridfarm and complexity

machines.

(27)

p

The figure shows average results for every 100

seconds.

p

The decrease in average latency shows that the

algorithm manages to move replica copies to

where they wanted.

Every 100 sec

0 5 10 15 20 25

Laten

cy

(ms)

0 1 2 3 4 5 6 7

Dynamic Replication Performance Chart - Distribution between Bloomington, IN and Indianapolis, IN

Average - Dynamic Replication

STDev - Dynamic Replication

Average -Distribution

(28)

28 of 34

Storage: Replica content placement

p Pub-sub system for replica content placement

p Each node keeps a Replica Server Map

n The new coming node sends a multicast probe message when

it joins a network

n Each network node responds with a unicast message to make

themselves discoverable

p Selection of Replica Server(s) for content placement

n Select a node based on proximity weighting factor

p Sending storage request to selected replica servers

n 1st step: initiator unicasts storage request to each selected replica server

n 2nd step: recipient server stores the context and becomes subscriber to the topic of that context

(29)

29 of 34

Fault-tolerance experiment

Testing Setup

Backup frequency every 10 seconds

p The test system consists of

a NaradaBrokering server(s) and four hybrid WS-Context instances separated with significant network

instances.

p We determine the time for

average end-to-end replica content creation.

p We run the system

continuously for 25000 observations.

p Gridfarm and Teragrid

(30)

30 of 34

Fault-tolerance experiment result

observation. The system was continuously tested for 25000 observations.

p The results indicate the continuous operation does not

degrade the performance.

1 replica creation

(Indianapolis) (Indianapolis, IN -2 replica creation Tallahassee, FL)

3 replica creation

(Indianapolis-IN, Tallahassee-FL, San Diego-CA)

Time (ms) 0 10 20 30 40 50 60 70

overhead of replica creation when using one intermediary broker overhead of replica creation when using two intermediary brokers end-to-end latency

p The figure shows the results gathered from fault-tolerance

experiments data.

p Overhead of replica creation increases in the order of

milliseconds as the fault-tolerance level increase.

0 5 10 15 20 25

Time (ms) 0 1 2 3 4 5 6 7 8

9 1 replica creation at remote location: Indianapolis, IN Average - Latency STDev - Latency Average - One Broker STDev - One Broker Average - Two Brokers STDev - Two Brokers

0 5 10 15 20 25

Ti me (ms) 0 10 20 30 40 50 60 70 80

3 replica creation at remote locations: San Diego, Indianapolis and Tallahase - Fault Tolerance Chart

Average - Latency STDev - Latency Average - One Broker STDev - One Broker Average - Two Brokers STDev - Two Brokers

0 5 10 15 20 25

Time (ms) 0 5 10 15 20 25 30 35 40 45

2 replica creation at remote locations: Indianapolis, Tallahase - Fault Tolerance Chart

(31)

Consistency enforcement

p Pub-sub system for enforcing consistency p Primary-copy approach

n Updates of a same data are carried out at a single server

n Use of NTP protocol based synchronized timestamps to impose

an order to write operations on the same data

p Update distribution

n 1st step: An update request is forwarded (unicast) to the primary copy holder by the initiator

n 2nd step: The primary-copy holder performs the update

request and returns an acknowledgement

p Update propagation

n The primary-copy pushes (broadcasts) updates of a context,

n on the topic (hash value) correspond to the key of the context, n if the primary-copy realizes that there exist a stale copy in the

(32)

32 of 34

Consistency Enforcement Experiment

Test Methodology

T1 T2 T3

Time = T1 + T2 + T3

Backup frequency every 10 seconds Message size 2.7 Kbytes

p The test system consists

of a NaradaBrokering server and two hybrid WS-Context instances for access request

distribution.

p We determine the avg.

time required for

enforcing consistency.

p We run the system

for 25000 observations.

p Gridfarm and

(33)

Consistency Enforcement Test Result

observation. We have 25000 continuous observations.

p The average transfer time shows the continuous

operation does not degrade the performance.

p The figure shows the results gathered from consistency

experiments data.

p The results indicate that the overhead of consistency

enforcement is in milliseconds and the cost remains the

bloomington-indianapolis bloomington-tallahassee bloomington-san diego

Time (ms) 0 10 20 30 40 50 60 70

overhead of distribution when using one intermediary broker overhead of distribution when using two intermediary brokers latency Every 1000 observations

0 5 10 15 20 25

Time (ms) 0 1 2 3 4 5 6 7 8

9 Bloomington - Indianapolis Consistency Enforcement Chart Average - Latency STDev - Latency Average - One Broker STDev - One Broker Average - Two Brokers STDev - Two Brokers

0 5 10 15 20 25

Time (ms) 0 5 10 15 20 25 30 35 40 45

Bloomington, IN - Tallahassee, Florida Consistency Enforcement Chart

Average - Latency STDev - Latency Average - One Broker STDev - One Broker Average - Two Brokers STDev - Two Brokers

0 5 10 15 20 25

Time (ms) 0 10 20 30 40 50 60 70

80 Bloomington - San Diego Consistency Enforcement Chart

(34)

34 of 34

Comparison of Experiment Results

p The figure shows the results gathered from the distribution,

fault-tolerance and consistency experiments data.

p The results indicate that the overhead of integrating JavaSpaces

with pub-sub system for distribution, fault-tolerance, and consistency enforcement is in the order of milliseconds.

distribution consistency

enforcement fault tolerance (3 replicacreation)

Ti

me

scal

e

(ms)

0 1 2 3 4 5 6 7

One Broker

(35)

Contribution

p We have shown that communication among services can

be achieved with efficient mediator metadata strategies.

n Efficient mediator services allow us to perform collective

operations such as queries on subsets of all available metadata in service conversation.

p We have shown that efficient decentralized metadata

system can be built by integrating JavaSpaces with Publish/Subscribe paradigm.

n Fault-tolerance, distribution and consistency can be succeeded

with few milliseconds system processing overhead.

p We have shown that adaptation to instantaneous

changes in client demands can be achieved in decentralized metadata management.

p We have introduced data models and programming