Ontology Matching Prototypes - SCHEMA MATCHING AND MAPPING-BASED DATA INTEGRATION

We discuss several prototypes specifically developed for ontology matching, i.e., GLUE [35], OLA [49], ONION [99, 98, 100], PROMPT [107, 109], QOM [40, 41], SCM [68], S- MATCH [53, 54, 3], and the work of [101]. Several of them, in particular, OLA, PROMPT, QOM, and SCM participated in the EON Ontology Alignment Contest in 2004 [44, 48]. Except for GLUE and SCM considering class instances, the other prototypes are schema- based and only utilize metadata, such as class/concept name and structure in ontologies. ONION (2000)

The ONION (Ontology Composition) system [99] focuses on managing and manipulating mappings between ontologies for semantic integration of knowledge bases. It first employed SKAT (Semantic Knowledge Articulation Tool) [98] to identify correspondences between ontologies. SKAT follows a rule-based approach with rules specified in first-order logic to express match and mismatch relationships. SKAT also supports meth- ods to derive new correspondences, such as name matching and simple structural matching techniques based on is-a hierarchies. SKAT was later superseded by ARTGEN (Articulation Generator) [100], which provides two linguistic matchers exploiting termi- nological relationships in WordNet and frequency statistics of keywords in a corpus, respectively. A structure matcher is supported to search for further correspondences in the neighborhood of those matching elements suggested by the linguistic matchers. PROMPT (2000)

PROMPT [107, 109] was originally developed as a tool for ontology merging, which guides the user through the process with suggestions about classes and properties to be merged. Such suggestions are generated by two modules, ANCHORPROMPT and PROMPT- DIFF, following different match approaches. ANCHORPROMPT converts ontologies into directed labeled graphs. Provided with a set of correspondences between the so-called

anchor nodes, ANCHORPROMPT identifies the paths between the anchors in single ontologies, and compares the classes on the paths with each other. The similarity between the classes is increased if they appear at the same position on the paths. Finally, classes with similarity exceeding the median of all computed similarities are considered matching. PROMPTDIFF focuses on matching different versions of an ontology. It uses a multitude of matchers based on different heuristics, such as type and name equality, similar sib- lings, similar name suffixes and prefixes. A global result table is maintained containing all combinations of source and target elements of the same category, e.g., class or property. Each matcher scans through the table, picks those element pairs that are not yet matched and makes predictions for them if possible. As such, it is sufficient for two elements to be matched according to one of the matchers (i.e., Max aggregation). The execu- tion of the matchers is iterated in the fix-point computation manner until no changes in the result table, i.e., no more correspondences, are observed.

9.3.ON T O L O G Y MA T C H I N G PR O T O T Y P E S 8 9 GLUE (2002)

GLUE [35] extends the previous schema matching system LSD [34] to perform matching between taxonomies. As input taxonomies come with different sets of instances, GLUE first performs a classification process to associate the instances of classes in the source taxonomy to the classes of the target taxonomy and vice versa. To do this, GLUE employs the composite approach of LSD utilizing several machine learning-based matchers and a meta-learner to combine their predictions. The similarity between two classes is then computed from the numbers of the identified common and distinct instances between the classes. Finally, a so-called relaxation labeling technique is applied on the similarity matrix of classes to search for a mapping configuration that best satisfies the given domain constraints specifying match and mismatch rules.

OLA (2003)

OLA (OWL-Lite Alignment) [49] was specifically developed to match ontologies written in the OWL-Lite dialect. OLA performs a pairwise comparison of source and target elements of the same category, such as class, property, instance, data type, etc. In particular, the similarity between two elements of a particular category, e.g., class, is a weighted sum of the similarities for all relevant features of this category, such as names, super-/ subclasses, properties. While data type similarity is to be pre-specified, OLA uses synonym relationships in WordNet and a string matching function based on common sub- strings to compute name similarity. As there is a recursive relationship between elements and their related features/elements in the comparison, OLA employs fix-point computation to successively compute and propagate similarity from the features/elements at the lowest level of dependency, i.e computed without looking at the related elements, to their neighborhood.

QOM (2004)

QOM (Quick Ontology Mapping) [40, 41] focuses on matching ontologies in the RDF(S) format. Similarity between elements of the same category is computed by comparing names of their related elements, such as super-/subclasses, instances, and properties of a class, using the EditDistance algorithm. The similarities computed for different kinds of related elements are aggregated using a weighted sum. Like in COMA, correspondences are identified as element pairs with the aggregated similarity exceeding a threshold from both directions for the source and target ontologies. To cope with very large ontologies, QOM supports an iterative match approach to successively reduce the search space for match candidates. In particular, candidate correspondences to be examined next are identified from the neighborhood of previously matched elements according to name similarity or ontology structure.

SCM (2004)

SCM (Semantic Category Matching) [68] performs a statistical analysis on instance data to find correspondences between classes and properties of two ontologies. For each element, SCM determines a feature vector containing occurrence statistics of all keywords found in the instances. It then performs a pairwise comparison of the feature vectors, thereby restricting only to the keywords contained in both vectors. Elements with similar feature vectors are further examined by a structural matcher. This successively chooses an element pair as reference and weights the other ones according to their location consistency, e.g., if both elements are respective descendants of the reference elements. Finally, correspondences are selected according their average structural consistency weights.

S-MATCH (2004)

S-MATCH [53, 54, 3] represents a logic-based approach to identify semantic relationships (equivalence, more or less general, overlapping, mismatch) between concepts of taxonomies. In the first step, relations of nodes are derived from their name similarity by either looking up in auxiliary sources, such as WordNet, domain ontologies and thesauri, or using string matching algorithms. In the second step, S-MATCH tries to determine relations of paths, i.e. sequences of nodes from the root to a node. Each pair of paths is successively checked with the different semantic relationships to find the best suitable one as the match result. On the one side, an axiom is constructed as the conjunction of all relations known from the first step between the nodes of the two paths. On the other side, the semantic relationship to be checked is represented as a propositional formula, in which each path appears as the conjunction of its nodes. The match problem, i.e., decid- ing if a particular relationship holds between two paths, is translated to a boolean satisfi- ability problem by testing if the propositional formula can be derived from the axiom. To solve it, S-MATCH exploit established techniques from the SAT field (SAT solvers). Mork and Bernstein (2004)

Mork and Bernstein [101] adapt the existing prototypes CUPID and SIMILARITYFLOOD- ING to perform matching on very large medical ontologies. Their approach merges the results of three phases, lexical, structural, and hierarchical matching (i.e., Max aggregation). Using CUPID, lexical matching identifies concepts with similar names, thereby exploiting external dictionaries for synonym and word usage information. Structural matching is based on a variant of SIMILARITYFLOODING and searches for concepts with similar neighbors. Finally, hierarchical matching identifies concepts with similar descendants. To reduce the match complexity for large ontologies, structural matching only focuses on matching relationships, while hierarchical matching only considers direct children and grand children.

In document SCHEMA MATCHING AND MAPPING-BASED DATA INTEGRATION (Page 106-108)