Chapter 2 Literature Review 6
2.4 Ontology-driven Semantic Approaches 22
2.4.3 Ontology Integration 36
2.4.3.3 Ontology Integration Process and Methodology 43
McGuinness introduces a specification of the integration process in [McGuinness, et al., 2000], where ontology integration consists of (the iteration of) the following steps:
(1)find the places in the ontologies where they overlap;
(2)relate concepts that are semantically close via equivalence and subsumption relationships (aligning);
(3)check the consistency, coherency and non-redundancy of the result.
As pointed out by Noy in [Noy, 2003], it may never be possible to find all alignments / mappings between ontologies completely and automatically since some of the intended semantics can only be discerned by humans. However, ontology integration on a large scale will be possible only if we can make significant progress in identifying mappings automatically or semi-automatically. Methodologies are necessary to guide and support the automatic or semi-automatic ontology integration. (1) Basic Strategy for Discovering Concept Similarity
The comparison of concept similarity is a fundamental issue for ontology integration. Alignment, mapping, or merging can be possible only if the concepts from different ontologies that have semantic similarity are discovered.
The basic alignment algorithm in ArtGen [Mitra and Wiederhold, 2002] calculates the similarity between concepts based on their names which are seen as lists of words. One method to compute the similarity between a pair of words is based on the similarity between the contexts (1000-character neighbourhoods) of all occurrences of the words in a set of domain-specific Web pages.
In FCA-MERGE [Stumme and Maedche, 2001] the user constructs a merged ontology based on a concept lattice. The concept lattice is derived using a formal concept analysis based on how documents from a given domain-specific corpus are classified to the concepts in the ontologies using natural language processing techniques. OntoMapper [Prasad, et al., 2002] provides an ontology alignment algorithm using Bayesian learning. A set of documents (abstracts of technical papers taken from ACM’s digital library and Citeseer) is assigned to each concept in the ontologies. Two raw similarity scores matrices for the ontologies are computed. The similarity between the concepts is calculated based on these two matrices using the Bayesian method.
Some systems implemented alignment algorithms based on the structure of the ontologies. Most of them rely on the existence of previously aligned concepts. For instance, Anchor-PROMPT [Noy and Musen, 2001] determines the similarity of concepts by the frequency of their appearance along the paths between previously aligned concepts. The paths may be composed of any kind of relations. SAMBO [Lambrix and Tan, 2006] provides a component where the similarity between concepts is augmented based on their location in the is-a hierarchy relative to already aligned concepts. OntoMapper does not require previously aligned concepts and takes the documents from the sub-concepts into account when computing the similarity between two concepts.
(2) Research on Methodologies
An early methodology for ontology merging in a medical domain is proposed in [Gangemi, et al., 1998]. The methodology to build ontologies presented in [Uschold and King, 1995] includes an integration step. This methodology proposes that integration should be done either during capturing (knowledge acquisition), or coding (implementation) or both. However, the problem is recognized as difficult and no solutions for the problem of how integration is performed are proposed or discussed herein.
The methodology to build ontologies proposed in [Gruninger, 1996] also refers to integration. This methodology mentions two kinds of integration: “combining ontologies that have been designed for the same domain” and “combining ontologies from different domains”. According to this methodology, ontologies are built based on ontology building blocks and foundational theories. According to the building blocks and foundational theories of the ontologies being integrated, integration is distinguished as: integration (at the level) of the building blocks - the most simple; integration (at the level) of the foundational theories, which is more difficult and may result in only partial integration; and ontology translation when the ontologies are so different that they share neither the building blocks nor the foundational theories, which makes integration extremely difficult.
METHONTOLOGY [Fernandez, et al., 1997 and Fernandez, et al., 1999] is another methodology to build ontology that also considers integration. It proposes that the development of an ontology should follow an evolving prototyping life cycle and not a waterfall one. This methodology proposes that ontology building, and therefore ontology integration, should be done preferably at the knowledge level (in conceptualization) and not at the symbol level (in formalization, when selecting the representation ontology) or at the implementation level (when the ontology is codified in a target language).
The methodology followed by Skuce to find the ontological distinctions presented in [Skuce, 1997] was by brainstorming, followed by meetings with other researchers interested in the problem. The proposed methodology begins with the creation of a group involving a diverse group of researchers working in different locations. Each member develops a list of primitives, distinctions and categories carefully chosen, defined and carefully documented (choices and definitions). The choices are presented to the group for discussion and approval. Only when they are agreed upon can they get to the formalization stage. The idea is to try to find a standardized upper model that would greatly ease some kinds of integration efforts.
Other methods include: Hovy and colleagues describe a set of heuristics that researchers at ISI/USC used for the semi-automatic alignment of domain ontologies to a large central ontology [Hovy, 1998]. Their techniques are based mainly on the linguistic analysis of concept names and natural-language definitions of concepts. PROMPT uses the structure of ontology definitions and the structure of a graph representing an ontology to suggest to the ontology designer which concepts may be related [Noy and Musen, 2003]. GLUE applies machine-learning techniques to instance data conforming to ontologies to find related concepts [Doan, et al., 2002].