• No results found

XML Metadata Services

N/A
N/A
Protected

Academic year: 2020

Share "XML Metadata Services"

Copied!
35
0
0

Loading.... (view fulltext now)

Full text

(1)

1

XML Metadata Services

SKG06 http://www.culturegrid.net/SKG2006/

Guilin China

November 3 2006

Mehmet S. Aktas, Sangyoon Oh, Geoffrey C. Fox and Marlon Pierce

Presented by Geoffrey Fox: Computer Science, Informatics, Physics Pervasive Technology Laboratories

Indiana University Bloomington IN 47401

(2)

2

Different Metadata Systems

There are many WS-* specifications addressing meta-data

defined broadly

WS-MetadataExchange

WS-RF

UDDI

WS-ManagementCatalog

And many different implementations from (extended) UDDI

through MCAT of the Storage Research Broker

And of course representations including RDF and OWL

Further there is system metadata (such as UDDI for core

services) and metadata catalogs for each application domain such as WFS (Web Feature Service) for GIS (Geographical Information Systems)

They have different scope and different QoS trade-offs

e.g. Distributed Hash Tables (Chord) to achieve scalability in large scale networks

WS-Context

ASAP

WBEM

(3)

Different Trade-offs

It has never been clear how a poor lonely service is meant to

know where to look up meta-data and if it is meant to be thought up as a database (UDDI, WS-Context) or as the contents of a

message (WS-RF, WS-MetadataExchange)

We identified two very distinct QoS tradeoffs

1) Large scale relatively static metadata as in (UDDI) catalog of

all the world’s services

2) Small scale highly dynamic metadata as in dynamic workflows

for sensor integration and collaboration

Fault-tolerance and ability to support dynamic changes with

few millisecond delay

But only a modest number of involved services (up to 1000’s in a session)

(4)

4

(5)

WS-Context compliant XML

Metadata Services

We designed and built a

WS-Context

compliant XML

Metadata services supporting distributed or central

paradigms.

This service

,

supports extensive metadata requirements of rich

interacting systems,

such as

correlating activities of widely distributed services, EX: workflow style GIS Service Oriented Architectures, AND

optimizing Grid/Web Service messaging performance, EX:

mobile computing environment, AND

managing dynamic events especially in multimedia

collaboration, EX: collaboration Grid/Web service applications, AND

providing information to enable session failure recovery

(6)

6

Context as Service Metadata

We define all metadata (static, semi-static, dynamic)

relevant to a service as “

Context

”.

Context

can be associated to a single service, a

session (service activity) or both.

Context can be independent of any interaction

slowly varying, quasi-static

context

Ex: type or endpoint of a service, less likely to

change

Context

can be generated as result of service

interactions

dynamic, highly updated

context

information associated to an activity or session

Ex: session-id, URI of the coordinator of a

(7)

Hybrid XML Metadata Services –>

WS-Context + extended UDDI

We

combine

functionalities of these two services:

WS-Context AND extendedUDDI in one hybrid service to

manage

Context

(service metadata).

WS-Context

controlling a workflow

(Extended) UDDI

supporting semantic service

discovery

This approach enables

uniform query capabilities

on

service metadata catalog.

(8)

8 HTTP(S) WSDL Client WSDL Client HTTP Subscriber Publisher Database JDBC Extended UDDI Service WSDL Database WSDL Hybrid-WSContext Service JDBC Database WSDL Hybrid-WSContext Service JDBC Topic Based Publish-Subscribe

Messaging System

Replica Server-2 Replica Server-N

WSDL WSDL Hybrid-WSContext Service Database W S D L JDBC

Distributed Hybrid WS-Context XML Metadata Services

Replica Server-1

(9)

Key Features

Publish-Subscribe

exploited to support replicated

storage e.g.

Initial storage of context

Update to make copies consistent

Access context

Use of

Javaspaces cache

running in memory on each

WS-Context node

Naturally supports Get Context by name requests

Backed up every ~30 milliseconds to a MySQL database

If query can be satisfied by Javaspaces cache, the

query

can be satisfied in < 1ms

plus the few milliseconds of

(10)

10

TupleSpaces-Based Caching Strategies

TupleSpaces is a communication paradigm

asynchronous communication

pioneered by David Gelernter

first described in Linda project in 1982 at Yale

communication units are tuples

data-structure consisting of one or more typed fields

Hybrid WS-Context Service employs/extends TupleSpaces:

all memory accesses. overhead is negligible (less than 1msec. for inqueries)

data sharing - mutual exclusive access to tuples

associative lookup - content based search, appropriate for key-based

caching

temporal, spatial uncoupling of communicating parties

e.g. a tuple: ("context_id", Context). This indicates a tuple with two fields:

a) a string, "context_id" and b) a Java object, "Context".

(11)

Managing Context UDDI WS-Context

purpose standard way of publishing, discovering generic Web Service information

standard way of maintaining distributed session state

information

metadata characteristics interaction-independent, rarely-changing, small-size

interaction-dependent, highly dynamic, small-size

types of typical queries high degree of complexity in inquiry arguments to improve the selectivity and increase the precision in the search results

simplicity in inquiry arguments, mostly key-based retrieval

queries, selectivity of queries is one.

scalability Whole Grid, UDDI is a domain-independent service for generic service metadata

Sub-Grids, modest number interacting Web Services participating an activity

desired features better expressiveness power of

service metadata (e.g., RDF-enabled UDDI Registries), up-to-date service entries (e.g., leasing capable UDDI Registries), domain-specific

capabilities (e.g., geospatial query capabilities), persistent storage

(12)

12

A general performance evaluation

on the most recent implementation

(13)

Prototype Evaluation - I

Performance Experiment:

We investigate the practical

usefulness of the system by exploring following

research questions.

What is the baseline performance of the hybrid WS-Context Service implementation for given standard operations?

What is the effect of the network latency on the baseline

performance of the system?

(14)

14

Test-4. extended UDDI inquiry/publication

W

S

D

L

single

threaded WS

D L extended UDDI Client 1 user/1000 transactions Extended UDDI Server Extended UDDI Server Engine

Test-1. Dummy Server

W

S

D

L

single

threaded WS

D L Client 1 user/1000 transactions Dummy Server Dummy Server

Test-2. Hybrid-WSContext inquiry/publication without database access

W

S

D

L

single

threaded WS

D L WS-Context Client 1 user/1000 transactions Hybrid-WSContext Service Publishing Querying

Module JDBC Handler Expeditor

Test -3. Hybrid-WSContext inquiry/publication with database access

W

S

D

L

single

threaded WS

D L WS-Context Client 1 user/1000 transactions Hybrid-WSContext Service Publishing Querying

Module JDBC Handler Expeditor

(15)

The experimental study indicates that the proposed system can provide comparable performance for standard operations with the existing metadata

TESTBED: Cluster node configuration

Processor Intel® Xeon™ CPU (2.40GHz)

RAM 2GB total

Network Bandwidth 900 Mbits/secnodes) .[1] (among the cluster OS GNU/Linux (kernel release 2.4.22) Java Version Java 2 platform, Standard Edition (1.4.2-beta-b19) SOAP Engine Axis 2 (in Tomcat 5.5.8)

Round Trip Time Chart for Inquiry Requests

5 7 9 11 13 15 17 19

1 2 3 4 5

av er ag e re sp o n se t im e (m se c) p er r eq u es t

Test-1: Dummy service

Test-2: WS-Context inquiry with memory access

Test-3: WS-Context inquiry with dabase access

Test-4: UDDI inquiry

Metadata Services Avg. latency for inquiries

hybrid WS-Context 8.41 ms

extended UDDI 17.5 ms

JUDDI 40 ms

UDDI-MT 20.37 ms

JWSD 18.99 ms

(16)

16

Prototype Evaluation - II

Scalability Experiment:

We investigate the scalability

of the system by finding answers to the following

research questions.

• What is the performance degradation of the system for standard operations under increasing message sizes?

• What is the performance degradation of the system for standard operations under increasing message rates?

(17)

TEST-1 - Hybrid-WSContext inquiry/publication with increasing message sizes

TEST-2 - Hybrid-WSContext inquiry/publication with increasing message rates (# of messages per

single threaded W S D L WS-Context Client 1 user/100 transactions W S D L Hybrid FTHPIS-WSContext Service Publishing Querying

Module JDBC Handler Expeditor HTTP(S) W S D L Thread Pool W S D L Thread Pool W S D L Hybrid-WSContext Service Publishing Querying

Module JDBC Handler Expeditor

5 Client distributed to cluster nodes 1 to 5, with each running

(18)

18 18 0 5 10 15 20 25 30

0.1 1.0 10.0 100.0

context payload size (KB)

a v g r o u n d t ri p t im e ( m ill is e c o n d s ) Tinquiry=T(RTT) Tpublication=T(RTT)

The results indicate that the cost of inquiry and publication operations remains the same, as the context’s payload size increases from 100Bytes up to 10KBytes. We also see that the hybrid WS-Context presents better performance than OGSA-DAI approach but latter technology more powerful

TESTBED: Cluster node configuration for hybrid WS-Context tests

Processor Intel® Xeon™ CPU (2.40GHz)

RAM 2GB total

Network Bandwidth 900 Mbits/secnodes) .[1] (among the cluster OS GNU/Linux (kernel release 2.4.22)

Java Version Java 2 platform, Standard Edition (1.4.2-beta-b19) SOAP Engine Axis 2 (in Tomcat 5.5.8)

Metadata Services Avg. latency for inquiries for 64KByte data retrieval

hybrid WS-Context 14.55 ms

OGSA-DAI WSRF 2.1 232 ms

=> OGSA-DAI Results are from

http://www.ogsadai.org.uk/documentation/scenarios/-performa nce

(19)

The results indicate that the proposed system can scale up to 940 simultaneous

querying clients or 222 simultaneous publishing clients where each client sending

one query per second, for small size context payloads with 30 milliseconds fault TESTBED: Cluster node configuration

Processor Intel® Xeon™ CPU (2.40GHz)

RAM 2GB total

Network Bandwidth 900 Mbits/sec.[1] (among the cluster nodes)

OS GNU/Linux (kernel release 2.4.22)

Java Version Java 2 platform, Standard Edition (1.4.2-beta-b19)

SOAP Engine Axis 2 (in Tomcat 5.5.8)

0 10 20 30 40 50 60 70 80 90

0 100 200 300 400 500 600 700 800 900 1000

message rate (message/per second)

av

g

r

o

u

n

d

t

ri

p

ti

m

e(

m

s)

(20)

Axis2 Performance on Mutlicore Machines

0 10 20 30 40 50 60 70

0 500 1000 1500 2000 2500 3000 3500

Messages per Second

R

ou

nd

T

rip

T

im

e

(m

s)

(m

s)

Grid Farm Sun Fire - 6 Cores Sun Fire - 8 Cores HP xw9300 Dell Intel Xeon

2 Chips

2 Core/chip 2 Chips

1 Core/chip

1 Chip

8 Core/chip

1 Chip

6 Core/chip

Xeon

Opteron

(21)

HTTP(S)

W

S

D

L

Thread Pool

W

S

D

L

Thread Pool

5 Client distributed to cluster nodes 1 to 5, with each running 1 to 15 threads firing

messages to randomly selected servers.

DISTRIBUTION TEST

We investigate scalability when moving from a centralized server to a distributed one under heavy workloads.

Numbered rectangle shapes correspond to an N-node FTHPIS system with various Publish-Subscribe topologies (this does NOT affect performance)

5 different FTHPIS system tested when N range from 1 to 5 under the same workload.

node-1

node-5

node-1

node-5

node-4

node-3

node-2 node-1

node-5

node-3

node-1

node-5

node-3

node-2

2 3 4 5

node-5

(22)

22

The results indicate that the scalability of metadata store can be increased when moving from a centralized service to a distributed system.

TESTBED: Cluster node configuration

Processor Intel® Xeon™ CPU (2.40GHz)

RAM 2GB total

Network Bandwidth 900 Mbits/secnodes) .[1] (among the cluster OS GNU/Linux (kernel release 2.4.22)

Java Version Java 2 platform, Standard Edition (1.4.2-beta-b19) SOAP Engine Axis 2 (in Tomcat 5.5.8)

900 950 1000 1050 1100 1150 1200 1250 1300

1 2 3 4 5

number of nodes

m e s s a g e r a te ( m s g /s e c o n d )

Hybrid WS-Context inquiry operation

# of nodes message rate

mean ± error

(ms) Stdev(ms)

1 940 47.05 ± 0.24 33.52 2 1005 40.76 ± 0.43 38.22 3 1082 38.58 ± 0.45 34.93 4 1148 36.28 ± 0.42 32.24 5 1221 34.13 ± 0.4 30.76

Non-optimal caching algorithm as does database access BEFORE Publish-Subscribe. Reversing this choice should lead to throughput

(23)

Prototype Evaluation - III

Fault Tolerance Experiment:

We investigate the

empirical cost of having fault-tolerance by finding

answers to the following research questions.

What is the cost of the fault-tolerance in terms of execution time of standard operations on a tight cluster?

How does the cost of fault-tolerance change when the replica

(24)

24 24 node-1 node-5 node-4 node-3 node-2 client node-1 node-5 node-4 node-3 node-2 link-1 link-2 link-3 link-4 client

Test-1. LAN experiment. All nodes and client are located on a tightly coupled local area network.

Test-2. WAN experiment. Nodes are located on a loosely coupled wide area network.

(25)

Summary of machine configurations

Location Processor RAM OS Java Version

gf6.ucs.indiana.edu

Bloomington, IN, USA Intel® Xeon™ CPU

(2.40GHz) 2GB GNU/Linux (kernel release 2.4.22)

Java 2, STE, (1.4.2-beta-b19)

complexity.ucs.indiana.edu

Indianapolis, IN, USA Sun-Fire-880, sun4u sparc

SUNW 16GB SunOS 5.9 Java HotSpot( TM) 64-Bit Server VM(1.4.2 -01)

lonestar.tacc.utexas.edu

Austing, TX, USA Intel(R) Xeon(TM) CPU

3.20GHz 4GB GNU/Linux (kernel release 2.6.9)

Java 2, STE, (1.4.2-beta-b19)

tg-login.sdsc.teragrid.org San Diego, CA, USA GenuineIntel IA-64, Itanium 2, 4 processors 8GB GNU/Linux Java 2, STE, (1.4.2-beta-b19)

vlab2.scs.fsu.edu

Tallahase, FL, USA Dual Core AMD Opteron(tm) Processor 270

2GB GNU/Linux (kernel release 2.6.16)

Java 2, STE, (1.4.2-beta-b19)

FAULT-TOLERANCE EXPERIMENT TEST

(26)

26

0 2 4 6 8 10 12 14 16 18

1 2 3 4 5

number of replicas

T

im

e

(m

se

c)

Test1 LAN testing case -publication

Test2 WAN testing case -publication

Test3 - Inquiry operation (request granted locally with memory access)

Test4 - Inquiry operation (request granted locally with database access)

FAULT-TOLERANCE TEST RESULTS

The results point out the inevitable trade-off between the fault-tolerance (degree of replication or high availability of data) and performance. The lower the level of fault-tolerance, the higher the performance would be for publication operations.

(27)

An Application Case Scenario

and

an application-specific

performance evaluation

(28)

28

28

Handheld Flexible Representation

(HHFR) is an open

source software for fast communication in mobile Web

Services. HHFR supports:

streaming messages, separation of message contents and

usage of context store.

http://www.opengrids.org/hhfr/index.html

We use WS-Context service as

context-store for

redundant message parts

of the SOAP messages.

redundant data is static XML fragments encoded in every SOAP message

Redundant metadata is stored as context associated to service conversion in place

The empirical results show that we gain

83%

in

message size

and on avg.

41%

on

transit time

by using

WS-Context service.

(29)

Optimizing Grid/Web Service Messaging

Performance

· HHFR Scheme · Representation · Headers

· Stream Info.

Context-Store

Save Context

(setContents) Retrieve Context (getContents)

Stream of Message in Preferred Representation

Negotiation Over SOAP

HHFR Endpoint (Mobile) HHFR Endpoint

(Conventional)

(30)

30

Performance

with and without Context-store

Message Size Without Context-store With Context-store

Ave.±error Stddev Ave.±error Stddev

Medium: 513byte (sec) 2.76±0.034 0.187 1.75±0.040 0.217

Large: 2.61KB (sec) 5.20±0.158 0.867 2.81±0.098 0.538

Experiments ran over HHFR

 Optimized message exchanged over HHFR after saving

redundant/unchanging parts to the Context-store

Save on average

83% of message size, 41% of transit time

(31)

System Parameters

 Taccess: time to access to a Context-store (i.e. save a context or

retrieve a context to/from the Context-store) from a mobile client  TRTT: Round Trip Time to exchange message through a HHFR

channel

 N: number of simultaneous streams supported by stream summed over ALL mobile clients

 Twsctx: time to process setContext operation

 Taxis: time consumed for Axis process

 Ttrans: transmission time through network

(32)

32

Context-store:

System Parameters

Context-store (Information Service)

Service Provider (Endpoint A)

Mobile Client (Endpoint B)

Taccess

= Taxis + Twsctx + Ttrans

TRTT High performance Channel of HHFR

Transit

Client Client

Axis

Network Network

WS-CTX

(33)

33

Summary of

T

axis

and T

wsctx

measurements

T

access

= T

wsctx

+ T

axis

+ T

trans

Data binding overhead

at Web Service Container

is the dominant factor to

message processing

1.4 1.6 1.8 2

0 100 200 300 400 500

Ti

m

e

(m

se

c)

Twsctx

(34)

34

 C

hhfr = nthhfr + Oa + Ob

 C

soap = ntsoap

Breakeven point:

n

be thhfr + Oa + Ob =

n

be tsoap

Oa(WS) is roughly 20 milliseconds

Performance Model and Measurements

Average±error (sec) Stddev (sec)

Context-store Access (Oa) 4.127±0.042 0.516

Negotiation (Ob) 5.133±0.036 0.825

Oa : overhead for accessing the

(35)

String Concatenation

Measure the

total time to process stream

Independent

variables

Number of messages per stream

Size of the message

0 5 10 15 20 25 30 35

0 20 40 60 80 100 120 140

Number Of Messages Per Stream

Ti m e fo r F in is hi ng M es sa ge S tre am (s ec

) HHFR: 16 String Per Message

SOAP: 16 String Per Message

References

Related documents

We examine the adequacy of VaR models to capture market risk in China and Hong Kong markets. I have carried out Historical Simulation, then Hull-White Historical Simulation, EVT,

Bluefin tuna: Japanese longliners catch a large amount of these tunas in the Atlantic and Mediterranean, reaching 70% of the catches with this fishing gear (Fonteneau,

We will become a vigorous rural area whose inhabitants have a sense of community and co-operate across all sec- toral and municipal borders, while our commitment and knowledge

83,3 85,3 96 54,7 75,3 Jumlah 78,92 Bahwa hasil penelitian dan pengisian daftar pertanyaan (koesioner) ke 30 responden terhadap faktor internal yang terdiri dari 5 (lima)

2.5 Path of a particle outside of the boundary layer (full line) and inside the boundary layer (dashed line). 2.7 Comparison of 2D and 3D experimental lift curves. 2.8 Sketch

Template-based prediction of carbohydrate-binding proteins, binding residues and complex structures by structural alignment and binding

Enter the SIP (VoIP) server information provided by your voice account provider, then click Apply to

In view of the additional information provided and of the reported relevance of the relations for the work on medical devices, and in order to encourage