• No results found

Approaches to the Integration of Distributed and Heterogeneous Data Resources

N/A
N/A
Protected

Academic year: 2020

Share "Approaches to the Integration of Distributed and Heterogeneous Data Resources"

Copied!
50
0
0

Loading.... (view fulltext now)

Full text

(1)

Approaches to the Integration of

Distributed and Heterogeneous

Data Resources

Ahmet Sayar

Indiana University

(2)

Motivation

Integrating data from multiple data sources

Distributed query and transactions of data.

Definitions and adoptions of data, metadata and their

storages.

Accessing the data seamlessly.

(3)

Outline

Data Integration Approaches

– Application Specific Solutions

Application-Integration Framework

ASIS (Application Specific Information System)

– Database Federation

• Ogsa-DAI (Ogsa-Data Access and Integration)

Compare ASIS with Ogsa-DAI

– Digital Libraries

• SRB (Storage Resource Broker) • Sompel’s Digital Library Approach

(4)

Application Specific Solutions

The most common means of data integration

Expensive -in terms of time and skills

Developing and using requires deep system

knowledge

Better results for special-purpose applications

Fragile

– Changes to the underlying sources may easily break the application

Hard to extend

(5)

Outline

Data Integration Approaches

– Application Specific Solutions

Application-Integration Framework

ASIS

– Database Federation

• Ogsa-DAI

Compare ASIS with Ogsa-DAI

– Digital Libraries

• SRB

• Sompel’s DL

(6)

Application-Integration

Framework

• It can also be called component-based framework

– Such as CORBA or Filters with common interfaces

• Not necessarily address data integration issues

• Based on common data model (such as CML and GML)

– With adaptors, if the source change the adaptor may have to change, but application may never see it.

• Adding a new source is easy

– a new adaptor may need to be written. – The adaptor may already be exist online.

• No need to detailed system knowledge

(7)

ASIS (1)

• Enables inter-service communication through well-defined service interfaces, message formats and capabilities metadata.

• Data model is ASL (Application Specific Lang.) • Metadata model is capability document

• Data and metadata have common predefined schema • Components are Filter Services

– Web Services, comon service interfaces defined in WSDL – Information/data services enabling distributed access,

querying and transformation through their predictable input/output interfaces.

(8)

ASIS (2)

• Data and data storage model

– Any data can be integrated into the system after transforming to ASL.

– Heterogeneity is handled at the end-Filters with adaptors. – ASL is community-accepted application specific language

• GML (Geographic Markup Lang.) in GIS applications

• CML (Chemistry Markup Lang.) in Chemistry applications

– Filter’s common service interfaces

• getCapabilities, getData, getFeatureInfo.

– Requests to Filter’s interfaces

• getCapabilitiesReq, getDataReq, getFeatureInfoReq

(9)

ASIS (3)

• Metadata and Metadata storage model:

– Data integration is done through Filters’ capability metadata – Metadata is stored in local Filter’s file system as a flat file. – Capability:

• Inspired from OGC WMS capability specification. • Look like Dublin Core format.

• Capability like structure is also used in Gannon’s approach (XPOLA), for Grid services’ security issues.

• Describes dynamic Web/Grid resources. • Updated manually or dynamically.

• Consists of descriptor, service and provider metadata

(10)

ASIS (4

Data Access and Filter Chaining

Earth and Fault Earth (raster)

F2

Earth, Fault and State Boundary

None

F1

After Chaining Data Provided

Initial Data Provided Filter Name F 1 F 3 F

2 F4

Fault State Boundary

Earth

• Each Filter is capable of acting as both a server and a client

• Capability integration is done through “getCapability” service interface

• Requests for common service

interfaces are created in accordance with predefined XML schema

Fault State Boundary Fault

(11)

Outline

Data Integration Approaches

– Application Specific Solutions

Application-Integration Framework

ASIS

– Database Federation

• Ogsa-DAI

Compare ASIS with Ogsa-DAI – Digital Libraries

• SRB

• Sompel’s DL

(12)

Database Federation

Middleware consisting of database management

system

Uniform access to number of heterogeneous

data sources

Provides query language used to combine,

contrast, analyze and manipulate the data

Data integration is done through Database

integration.

Combine data from multiple sources in a single

SQL statement – query recreation.

Ex. Ogsa-DAI (Open Grid Service Architecture –

(13)

Ogsa-DAI (1)

• Provides common Java API for accessing and

integrating data resources –such relational and XML databases, and files- in Grid environment

• Specifically designed for OGSA architecture

• SQL queries on relational resources and XPath statements on XML collections

• Provides data pipelining (similar to Filter chaining) via an XML document called “perform” document.

(14)

Ogsa-DAI (2)

Data and storage model :

– Any data stored in XML or relational databases, files – No common data model

– Data is provided through GDS (Grid Data Services) – Uses Ogsa-DQP (Distributed Query Processor) to

coordinate to access to multiple data services – The enactment engine is the core of Ogsa-DAI.

Orchestrate running of the perform document – Information in perform document includes:

• The list of activities and their XML schemas and implementation classes.

(15)

Ogsa-DAI (3)

• Metadata storage model:

– Metadata is kept in Catalog Service (MCS) – MCS enables attribute-based querying

– Metadata is for the datasets, data can be anything (binary, text ..) – Data integration is done through XML based activity file mixing

activities (in SQL queries) and metadata

• Simple data access scenario

– A client contacts a DAISGR first to locate the GDSFs.

– Accesses suitable GDSFs directly to find out more about their properties and the data resources they represent.

– Asks GDSF to instantiate a GDS

(16)

Ogsa-DAI (4)

• Metadata model:

No common schema for metadata like

capability

Defines Metadata for the datasets

• No schema in XML

• Stored in Database tables as attributes

Defines Metadata for the Database system to

enable querying and defining activities

• Schema in XML (mcsActivity.xsd schema file)

(17)

ASIS vs. Ogsa-DAI

• Ogsa-DAI does not define metadata and data in XML

schema. Metadata is mixed with Database schema. ASIS has predefined data and metadata models.

• Ogsa-DAI uses any data, and they have predefined

Database schema to enable querying and accessing data. • ASIS’s data integration is on demand and based on

capability federation. Instead, Ogsa-DAI’s data integration is coded in XML struc perform and activity documents.

• Ogsa-DAI has central (MCS), ASIS has distributed metadata approach.

• Both system are based on Web Services.

(18)

Outline

Data Integration Approaches

– Application Specific Solutions

Application-Integration Framework

ASIS

– Database Federation

• Ogsa-DAI

Compare ASIS with Ogsa-DAI

– Digital Libraries

• SRB

• Sompel’s DL

(19)

Digital Libraries

• Main focus is publishing and discovering of the digital objects.

• Digital Objects : file, URL, SQL command string and any string of bits.

• Collects data from multiple different data sources. • It is little bit different from the other data integration

approaches

– Data curation services – such as publishing and removing data from the data sources.

(20)

SRB (1)

• A federated client server system

• Each server managing/brokering a set of resources • An implementation architecture for

– Data grids

– Digital Libraries.

• Storage resources include digital libraries, MSS, UniTree and file systems

• SRB consists of three components

– MCAT services,

– SRB servers to access to storage repositories and – SRB clients

• Mediates access to distributed heterogeneous resources • Uses MCAT (Metadata Catalog Service) to facilitate

(21)

SRB (2)

• Data and storage model:

– Uniform storage interface

– Resource-specific drivers to map from defined storage to interface – Storage resources are registered within SRB as physical resources – Logical resources (LSR) enable replication.

– LSR = one or more than one physical resource

– Client API refers to LSR. Collections are created by LSR

• Metadata storage model (MCAT):

– Serves both a core-metadata and domain-dependent metadata – Core-metadata is a standardized schema like Dublin Core

– Stores metadata about data, collections, users, resources, methods – Attribute based access and querying, updating metadata catalog – Implemented as a relational database. Oracle, DB2 or Sybase – Abstraction and Replica information for data

(22)

SRB (3)

Metadata and Metadata Exchange Model:

– MAPS (Metadata Attribute Presentation Structure) – Independent of the internal representation of the

attributes inside the catalog.

– Provides a uniform interface specification that can be used between user applications and the MCAT

catalog and vice verse.

– Structures which form the MAPS:

• MAPS_Query_Struct, • MAPS_Result_Struct,

• MAPS_Update_Struct and • MAPS_Definition_Struct

(23)

SRB (4)

• Simple data access scenario:

– SRB server spawns SRB agent to authenticate the

user/Application by comparing it with information stored in MCAT. – Find the location in MCAT.

– Check user request against permissions stored in MCAT. – SRB agent contacts user with the result of his request.

– SRB agent communicates with the user through a port specific to this client session.

• SRB server chaining scenario (integrated SRBs):

– First 3 steps from simple data access case.

– SRB agent contacts remote SRB agent via remote SRB server. – The second SRB agent returns the pointer to the data item to the

first SRB agent which passes it on to the user.

(24)

ASIS vs. SRB

• SRB doesn’t define metadata in XML structure (as ASIS does)

• SRB uses any data but ASIS uses ASL

• SRB keeps the metadata in Catalogue Services (MCAT). ASIS uses XML structured capability metadata

• SRB has central metadata handling approach, ASIS has distributed metadata handling approach

• ASIS’s data integration is based on metadata federation, SRB’s data integration is based on SRB server

federation.

(25)

Sompel’s DL (1)

• Scholarly communication as a network-based workflow • Instead of Filters and ASL in ASIS, Sompel defines

“repositories” and “digital objects”, respectively.

• Repository is a networked system that provides services pertaining to a collection of Digital Objects

• Repositories have common service interfaces.

– “Obtain”, “Harvest” and “Put”.

• Two classes of participants.

– Data providers (DP) and Service providers (SP)

• SP collect metadata from DPs (via 3 service interface); normalize and cluster it to deal with duplicates.

(26)

Sompel’s DL (2)

• Data and storage model:

– Data is the abstraction of the Digital Objects – Digital Objects = Digital data + key metadata. – Serialization of Digital Objects = Surrogates – Surrogates

• Information for the value chains and service

• information used at repository service interfaces. • In the XML/RDF format

• Composed of “dataStream” and/or “Entity” tag elements. • Chained object is defined by keymetadataID or

“providerInfo”.

– Different storage types: book repositories, teaching object repositories, dataset repositories etc.

(27)

Sompel’s DL (3)

• Metadata model:

– Surrogates are essentially metadata records for objects

– Based on Dublin Core format with domain specific extensions. – Dublin core has 15 standard entities to define resources.

– For more details see http://doublincore.org

• Chaining for integrating data:

– Application/User doesn’t need to use workflow engine or script to create or run the chain. (As in ASIS)

– Chain (they call “value chain”) is hidden in the surrogates.

– Surrogates are updated through the common interfaces (“put” “obtain” and “harvest”) of the resources.

– Chain is defined in the “Entity” element in the surrogate document with the “Lineage” sub element.

• Sample chaining scenario:

– A paper might have references to some papers and these papers might be references to some other papers….

– Value chain does not stop.

(28)

ASIS vs. Sompel’s Approach

• Instead of Filters and ASL in ASIS, Sompel defines “repositories” and “digital objects” respectively

• DP correspond to End-Filters, and SP correspond to Filters in ASIS

• ASIS do not have publishing or putting service interfaces

– “Obtain” corresponds to “getData” in ASIS

– “Harvest” corresponds to “getCapabilities” in ASIS

• Both have distributed metadata approaches for data integration

– ASIS – direct communication between Filters by using “GetCapabilities” interface

– Sompe’s DL – direct communication between repositories and services by using “Harvest” interface

• Sompel’s DL uses Dublin Core for the representation of the resources – ASIS uses its own schema.

(29)

Summary

Application-Integration Framework (ASIS)

– Easy to add new sources

– Using online Filters providing required adaptors – peer-to-peer chain of Filters

– no central metadata catalog server – Distributed capability exchange and aggregation

– SOA

Re-usable components (Filters) for different

applications in predefined domain

Implications of Filter services

– Scalable and Fault-tolerant

• Load-balancing and caching

(30)
(31)
(32)

Capability in Grid Services Security

• XPOLA

– The infrastructure is built on a peer-to-peer chain-of-trust model. No central admins

– WS-Security compliant

– Extensible – PKI and SAML based

– Dynamic and reusable (manually or automatically generated) – Composed of two sectors.

• Policy document (SAML, lifetime info, binding info etc.) • Provider’s signature

• Existing grid security solutions to fine-grained authorization were not addressing general Web/Grid services in

compliant with Web Services security specs.

(33)

Sample Capabilities File (too simplified) – GIS

Domain

<?xml version='1.0' encoding="UTF-8" standalone="no" ?>

<!DOCTYPE WMT_MS_Capabilities SYSTEM "http://toro.ucs.indiana.edu:8086/xml/capabilities.dtd"> <Capabilities version="1.1.1" updateSequence="0">

<Service>

<Name>CGL_Mapping</Name> <Title>CGL_Mapping WMS</Title>

<OnlineResource xmlns:xlink="http://www.w3.org/1999/xlink" xlink:type="simple“ xlink:href="http://toro.ucs.indiana.edu:8086/WMSServices.wsdl" /> <ContactInformation> ….. </ContactInformation> </Service> <Capability> <Request> <GetCapabilities> <Format>WMS_XML</Format> <DCPType><HTTP><Get>

<OnlineResource xmlns:xlink="http://w3.org/1999/xlink" xlink:type="simple“ xlink:href="http://toro.ucs.indiana.edu:8086/WMSServices.wsdl" /> </Get></HTTP></DCPType> </GetCapabilities> <GetMap> <Format>image/GIF</Format> <Format>image/PNG</Format> <DCPType><HTTP><Get>

(34)

Dublin Core

• Challenge of resource description and discovery

• Language for making a particular class of statements about resources

• There 2 namespaces – Dublin Core element set (dc)and Dublin Core qualifiers (dcq ex. dcq:iso8601).

• Some of Dublin core metadata element set

– Title (dc:title), subject, description, creator, publisher, type, format, source, language, rights

• Using DC in RDF, specifications for DC in RDF (work in progress)

(35)
(36)
(37)

OAI

• Deals with e-print server world

• Need to develop services that permitted searching across papers housed at multiple repositories

• Repositories also needed capabilities to automatically identify and copy papers that had been deposited in them.

• Definition of an interface to permit e-print servers to expose the metadata for the papers that it held.

• Service providers with similar metadata standards need to harvest this metadata

(38)

OAI-PMH

For the variety of the communities engaged in

publishing content on the Web

Any networked server can emplly the protocol to

enable service providers to collect its metadata

HTTP-based request-response transaction

Service Providers

– Harvest metadata from Data Providers using the OAI protocol and use the returned metadata as a basis for building value-added services.

Data Providers (repositories)

(39)

Comments on OAI

OAI-PMH is ultimately only as useful as the

metadata it transports.

The tendency of implementers to almost

exclusively apply the lowest common

denominator of unqualified dublin core makes it

difficult to implement more advanced search

interface features.

Content providers should prefer more expressive

(40)
(41)

Sompel’s Approac

Hierarchy steps

(42)
(43)
(44)

Ogsa-DAI Figure

(45)
(46)

MCS

MCS present a design of Metadata Catalog

Service that provides mechanism for storing and

accessing descriptive metadata attributes

Requirements: Store domain-independent

attributes, user-defined attributes, query with a

set of attributes, query with a logical name,

authentication, authorization and auditing

Allows users to discover data sets based on the

value of descriptive attributes, rather then

(47)

MCAT vs. MCS

MCAT can be used just with SRB

MCS can be used just in OGSA architecture

MCAT stores both physical and logical

addresses

MCS stores logical metadata attributes and

handles that can be resolved by a data location

or data access services.

They can both be extended for serving

(48)
(49)
(50)

CLIENT

• Example interaction with SRB using Scommands:

– Sinit

• Start interaction with SRB

– Spwd

• Display current position within SRB repository

– Smeta -i –I “UDSMD0=‘author’” –I “UDSMD1=‘bob’” myfile

• Add metadata describing the author the file

– Smeta -i –I “UDSMD0=‘author’” –I “UDSMD1=‘arthur’”

• Search for files with author metadata set as arthur

– Sget myFile

• Copy myFile from SRB to local storage

– Sreplicate –S anotherResource myFile

• Create a replica of myFile on anotherResource

– Srm myFile

• Remove myFile (and all replicas) from SRB

– Sexit

References

Related documents

At low concentrations, the Hull-cell panels suffered uncoated area at low current density region was found to be reduced with increase in sodium sulphate concentration

This imminent change will affect many business insurance arrangements, as some incidents, currently covered under employers’ liability (EL) and public liability (PL)

Rihal, “Apical ballooning syndrome (Tako-Tsubo or stress cardiomyopathy): a mimic of acute myocardial infarction,” American Heart Journal, vol. Matetzky et al., “Left ventricular

Once the drop deadline has passed, students may with- draw from courses or completely withdraw from all courses taken on campus or online that are a full semester in length

Preparation for Meaningful Use stage 2 – build, identifying systems and reviewing

From martial law to peace and the promise of a social covenant: 1982 – 1991 In 1982 the newly-elected government of Belisario Betancur Cuartas (1982 – 1986) led a fresh attempt to

These findings would suggest, then, that an MRI image noting similar signals between herniated disc material and that material remaining within the disc space, small to