Approaches to Interoperability - Interoperability between heterogeneous and distributed biodive

Achieving interoperability is a complex task comprising a balanced mixture of communication, cooperation and competition among the communities and the software systems in a particular data domain. Community networks were formed that includes the experts of a particular domain to share ideas, research issues and develop interoperable software systems and data communication standards that can allow the data interoperability. Some examples of such networks in biodiversity domain are ENBI (European Network for Biodiversity Information). NBN (The National

Biodiversity Network) and GBIF (Global Biodiversity Information Facility). More details on these networks and biodiversity projects are discussed in the next section 2.4. This section describes about the common approaches and the related technologies used to resolve the interoperability issue.

Federated Database system – is an integrated collection of completely functional and independent databases controlled by local administrators but cooperating with the federation by supporting global operations [33]. This federation can be either tightly coupled or loosely coupled. A tightly coupled federated system presents a predefined static view to the end-user. This is usually based on a global schema that accommodates the entire component schema and maintained by system administrator who makes all schematic and semantic integration decisions in advance. In a loosely coupled system the integration is dynamic. The user is responsible for the integration of data or the system has to provide a mechanism for performing the integration of data.

Client server architecture - provides the ability of two or more components to cooperate despite differences in interface, execution language and platform. Client server applications achieve systems interoperability, using interface standardisation by mapping client and server interfaces to a common representation and interface bridging which uses two-way maps between client and server [18]. The Common Object Request Broker Architecture [34], OMG‘s open, vendor-independent architecture and Microsoft‘s Component Object Model COM/OLE [35] realize interoperability using interface standardisation. The client server architecture restricts the autonomy and heterogeneity of distributed data sources as they all have to conform to either a client or a server component which also imposes a maintenance problem once when the system is scaled up.

Mediator systems - provide a remedy to client/server architecture as they recognize the autonomy and diversity of the data systems [36], [37]. A Mediator acts as an interchange component which translates data between two systems with different data schemas to information by applying knowledge about resources, semantic information of data and user requirements. The mediator handles an information exchange by

converting the user query into a source compatible query and executes the query. This result is converted back into user recognizable format. In short it acts as a semantic gateway between the systems allowing the user to view all the sources without concern for the differences in names and representations of data.

Multiple View Definition System (MVDS) - focuses on the architecture of software to achieve interoperability in heterogeneous multidatabases [38]. Providing a tool (typically automated) for user to define the integration views to infer information from multidatabases is a way of supporting interoperation among heterogeneous and autonomous databases. [39], [40] Ontologies with all participating schema components are not a complete solution as they will not provide complete information to the users to make a query to the heterogeneous databases. A canonical data model and an architecture using knowledge base as a mediator that stores the static and dynamic knowledge about the participating databases has proved to be one answer to the issue of interoperation. A variety of other approaches in developing mediator systems involve the use of:

 Wrappers – Wrapping is a method of permitting existing legacy software systems to communicate with the current systems. A wrapper program can be described in two parts, an adapter that provides extra functionality to an application and an encapsulation mechanism that binds the adapter to the application [41]. It provides the communication interface between application programs by converting the data as required. The interoperation ability depends on the levels of abstraction in design, extensibility and maintainability of the wrappers.

 Data Warehouses [42] – A data warehouse is a centralized repository of information extracted from multiple data sources. It can serve as an index or as a cleaned data gathered from different heterogeneous systems. The disadvantage of this approach is the difficulty of updating the data and to keep them in synchronization with the local databases as the participating database numbers are growing.

 Metadata Repository systems - In this system the queries are formulated dynamically with the use of an on-line global metadata dictionary [43]. The metadata information can be stored using schema maps, data type with description logics and ontologies to solve queries over multiple web-based information sources.

 Shared Ontologies - A common ontology approach is used to resolve the semantic heterogeneity in a particular domain by using the knowledge ontologies [44], [45], which contain deep domain knowledge and form a conceptual standard.

The different approaches described in this section are the main technologies and tools that are used by the software information systems to create an integrated querying infrastructure to access multiple, distributed and heterogeneous data resources. Each approach has made progress in achieving interoperability but still possesses some limitations. For example the limitations of Federated systems require a common data model that has to be understood by all the participating databases, or if the data model is varied then another layer of mediation between the data structures is required to achieve interoperability. Client-server architecture requires the bulk of the processing to be performed at the server side and also the clients are to be continuously maintained for any new changes on the server side. With the advent of web-based data communication, client-server architecture is less preferred in designing distributed systems due to the requirement of centralized maintenance of the system. Tools such as wrappers, metadata repository and ontologies are used to either convert or translate the data formats. The choice and the ability of these tools to interpret the data format affect the design and implementation of the multiple querying systems. Our research evolves from analysing these technologies and tools with consideration of the nature of the biodiversity data domain and real data sets of biodiversity data providers.

The technical details of the interoperability approach adopted are discussed further in section 4.2. With reference to the types of interoperability described in the section 2.2 of this chapter, this research deals with the structural and semantic interoperability

issues prevailing among the established data providers communities in the biodiversity data domain. To overcome the technical interoperability the system design and architecture of the framework use standard internet communication protocols. Syntactic and structural interoperability are addressed by the use of XML transformations. Most of the semantics of the biodiversity data concepts are captured using a knowledgebase that is part of the architecture, as an alternative to capturing the semantics of data using a data model technique such as RDF. The limitation of this approach is that the developer of the knowledgebase needs to be aware of biodiversity data concepts, which are discussed in the remainder of this chapter, and to maintain the knowledgebase as the relevant standards evolve.

In document Interoperability between heterogeneous and distributed biodiversity data sources in structured data networks (Page 30-34)