Ontology matching: state of the art and future challenges

(1)

Ontology matching: state of the art and future

challenges

Pavel Shvaiko, J´

erˆ

ome Euzenat

To cite this version:

Pavel Shvaiko, J´

erˆ

ome Euzenat. Ontology matching: state of the art and future challenges.

IEEE Transactions on Knowledge and Data Engineering, Institute of Electrical and Electronics

Engineers, 2013, 25 (1), pp.158-176. <10.1109/TKDE.2011.253>. <hal-00917910>

HAL Id: hal-00917910

https://hal.inria.fr/hal-00917910

Submitted on 12 Dec 2013

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not.

The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destin´

ee au d´

epˆ

ot et `

a la diffusion de documents

scientifiques de niveau recherche, publi´

es ou non,

´

emanant des ´

etablissements d’enseignement et de

recherche fran¸cais ou ´

etrangers, des laboratoires

publics ou priv´

es.

(2)

Ontology matching:

state of the art and future challenges

Pavel Shvaiko and J ´er ˆome Euzenat

Abstract—After years of research on ontology matching, it is reasonable to consider several questions: is the field of ontology

matching still making progress? Is this progress significant enough to pursue further research? If so, what are the particularly promising directions? To answer these questions, we review the state of the art of ontology matching and analyze the results of recent ontology matching evaluations. These results show a measurable improvement in the field, the speed of which is albeit slowing down. We conjecture that significant improvements can be obtained only by addressing important challenges for ontology matching. We present such challenges with insights on how to approach them, thereby aiming to direct research into the most promising tracks and to facilitate the progress of the field.

Index Terms—Semantic heterogeneity, semantic technologies, ontology matching, ontology alignment, schema matching.

✦

1 I

NTRODUCTION

The progress of information and communication tech-nologies has made available a huge amount of dis-parate information. The problem of managing het-erogeneity among various information resources is increasing. For example, most of the database research self-assessment reports recognize that the thorny question of semantic heterogeneity, that is of handling variations in meaning or ambiguity in entity interpre-tation, remains open [1]. As a consequence, various solutions have been proposed to facilitate dealing with this situation, and specifically, to automate in-tegration of distributed information sources. Among these, semantic technologies have attracted particular attention. In this paper we focus on a kind of semantic technologies, namely, ontology matching.

An ontology typically provides a vocabulary that de-scribes a domain of interest and a specification of the meaning of terms used in the vocabulary. Depending on the precision of this specification, the notion of on-tology encompasses several data and conceptual mod-els, including, sets of terms, classifications, thesauri, database schemas, or fully axiomatized theories [2]. When several competing ontologies are used in differ-ent applications, most often these applications cannot immediately interoperate. In this paper we consider ontologies expressed in OWL as a typical example of a knowledge representation language on which most of the issues can be illustrated. OWL is succeeding to a large degree as a knowledge representation standard, for instance, used for building knowledge systems.

• Pavel Shvaiko is with TasLab, Informatica Trentina SpA. Via G. Gilli

2, 38121 Trento, Italy. E-mail: [email protected]

• J´erˆome Euzenat is with INRIA & LIG. 655 avenue de l’Europe, 38334

Saint-Ismier, France. Email: [email protected]

However, several matching systems discussed in the paper are able to deal with RDFS or SKOS as well. Database schemas and ontologies share similarity since they both provide a vocabulary of terms and somewhat constrain the meaning of terms used in the vocabulary. Hence, they often share similar matching solutions [3–7]. Therefore, we discuss in this paper ap-proaches that come from semantic web and artificial intelligence as well as from databases.

Overcoming semantic heterogeneity is typically achieved in two steps, namely: (i) matching entities to determine an alignment, i.e., a set of correspon-dences, and (ii) interpreting an alignment according to application needs, such as data translation or query answering. We focus only on the matching step.

Ontology matching is a solution to the semantic

het-erogeneity problem. It finds correspondences between semantically related entities of ontologies. These cor-respondences can be used for various tasks, such as ontology merging, query answering, or data transla-tion. Thus, matching ontologies enables the knowl-edge and data expressed with respect to the matched ontologies to interoperate [2]. Diverse solutions for matching have been proposed in the last decades [8, 9]. Several recent surveys [10–16] and books [2, 7] have

been written on the topic1 _{as well.}

As evaluations of the recent years indicate, the field of ontology matching has made a measurable improvement, the speed of which is albeit slowing down. In order to achieve similar or better results in the forthcoming years, actions have to be taken. We believe this can be done through addressing specifically promising challenges that we identify as: (i) large-scale matching evaluation, (ii) efficiency of matching techniques, (iii) matching with background

(3)

knowledge, (iv) matcher selection, combination and tuning, (v) user involvement, (vi) explanation of matching results, (vii) social and collaborative match-ing, (viii) alignment management: infrastructure and support.

This article is an expanded and updated version of an earlier invited conference paper [17]. The first contribution of this work is a review of the state of the art backed up with analytical and experimental comparisons. Its second contribution is an in-depth discussion of the challenges in the field, of the recent advances made in the areas of each of the challenges, and an outline of potentially useful approaches to tackle the challenges identified.

The remainder of the paper is organized as follows. Section 2 presents the basics of ontology matching. Section 3 outlines some ontology matching applica-tions. Sections 4 and 5 discuss the state of the art in ontology matching together with analytical and experimental comparisons. Section 6 overviews the challenges of the field, while Sections 7–14 discuss them in detail. Finally, Section 15 provides the major conclusions.

2 T

HE ONTOLOGY MATCHING PROBLEM

In this section we first discuss a motivating exam-ple (§2.1) and then we provide some basics of ontol-ogy matching (§2.2).

2.1 Motivating example

In order to illustrate the matching problem let us use the two simple ontologies, O1 and O2, of Figure 1. Classes are shown in rectangles with rounded corners,

e.g., in O1, Book being a specialization (subclass) of

Product, while relations are shown without the latter,

such aspricebeing an attribute defined on the integer

domain andcreatorbeing a property.Albert Camus: La

chuteis a shared instance. Correspondences are shown as thick arrows that link an entity from O1 with an entity from O2. They are annotated with the relation that is expressed by the correspondence: for example,

Personin O1 is less general (⊑) thanHuman in O2. Assume that an e-commerce company acquires an-other one. Technically, this acquisition requires the integration of their information sources, and hence, of the ontologies of these companies. The documents or instance data of both companies are stored ac-cording to ontologies O1 and O2, respectively. In our example these ontologies contain subsumption statements, property specifications and instance de-scriptions. The first step in integrating ontologies is matching, which identifies correspondences, namely the candidate entities to be merged or to have sub-sumption relationships under an integrated ontology. Once the correspondences between two ontologies have been determined, they may be used, for instance, for generating query expressions that automatically

Product Book CD price title doi creator . . . author integer string Person Monograph Essay Literary critics Politics Biography . . . Literature isbn . . . title subject Human Writer

Albert Camus: La chute

⊒ ⊒ = ⊒ ⊒ ⊑ O1 O2

Fig. 1: Two simple ontologies and an alignment.

translate instances of these ontologies under an inte-grated ontology [18]. For example, the attributes with

labels title in O1 and in O2 are the candidates to be

merged, while the class with label Monograph in O2

should be subsumed by the classProduct in O1.

2.2 Problem statement

There have been different formalizations of the match-ing operation and its result [11, 14, 19–21]. We follow the work in [2] that provided a unified account over the previous works.

The matching operation determines an alignment A′

for a pair of ontologies O1 and O2. Hence, given a pair of ontologies (which can be very simple and contain one entity each), the matching task is that of finding an alignment between these ontologies. There are some other parameters that can extend the definition of matching, namely: (i) the use of an input alignment A, which is to be extended; (ii) the matching parameters, for instance, weights, or thresholds; and (iii) external resources, such as common knowledge and domain specific thesauri, see Figure 2.

O1

O2

A matching A′

parameters

resources

Fig. 2: The ontology matching operation.

We use interchangeably the terms matching oper-ation, thereby focussing on the input and the result; matching task, thereby focussing on the goal and the insertion of the task in a wider context; and matching process, thereby focussing on its internals.

(4)

It can be useful to specifically consider match-ing more than two ontologies within the same pro-cess [22], though this is out of the scope of this paper. An alignment is a set of correspondences between entities belonging to the matched ontologies. Align-ments can be of various cardinalities: 1:1 (one-to-one), 1:m (one-to-many), n:1 to-one) or n:m (many-to-many).

Given two ontologies, a correspondence is a 4-uple: hid, e1, e2, ri,

such that:

• idis an identifier for the given correspondence;

• e1 and e2 are entities, e.g., classes and properties

of the first and the second ontology, respectively;

• ris a relation, e.g., equivalence (=), more general

(⊒), disjointness (⊥), holding between e1and e2.

The correspondence hid, e1, e2, ri asserts that the

relation r holds between the ontology entities e1 and

e2. For example, hid7,1,Book,Monograph,⊒i asserts that

Book in O1 is more general (⊒) than Monograph in

O2. Correspondences have some associated metadata, such as the correspondence author name. A frequently used metadata element is a confidence in the corre-spondence (typically in the [0, 1] range). The higher the confidence, the higher the likelihood that the relation holds.

3 A

PPLICATIONS

Ontology matching is an important operation in tra-ditional applications, e.g., ontology evolution [23], on-tology integration [24], data integration [25], and data warehouses [26]. These applications are characterized by heterogeneous models, e.g., database schemas or ontologies, that are analyzed and matched manually or semi-automatically at design time. In such applica-tions, matching is a prerequisite to running the actual system.

There are some emerging applications that can be characterized by their dynamics, such as peer-to-peer information sharing [27], web service compo-sition [28], search [29], and query answering [22]. Such applications, contrary to traditional ones, re-quire (ultimately) a run time matching operation and take advantage of more explicit conceptual models. A detailed description of these applications as well as of the requirements they pose to matching can be found in [2]. We illustrate only some of these applications with the help of two short real-world examples in order to facilitate the comprehension of the forthcoming material.

Cultural heritage.A typical situation consists of

hav-ing several large thesauri, such as: Iconclass2 _(25.000

entities) and the Aria collection (600 terms) from the

Rijksmuseum3. The documents indexed by these

the-2.http://www.iconclass.nl/

3.http://www.rijksmuseum.nl/collectie/index.jsp?lang=en

sauri are illuminated manuscripts and masterpieces, i.e., image data. The labels are gloss-like, i.e., sen-tences or phrases describing the concept, since they have to capture what is depicted on a masterpiece.

Examples of labels from Iconclass include:city-view, and

landscape with man-made constructionsandearth, world as celestial body. In contrast to Iconclass, Aria uses simple

terms as labels. Examples of these include:landscapes,

personifications and wild animals. Matching between these thesauri (that can be performed at design time) is required in order to enable an integrated access to the masterpieces of both collections. Specifically, alignments can be used as navigation links within a multi-faceted browser to access a collection via thesauri it was not originally indexed with [30].

Geo-information (GI). A typical situation at a ur-ban planning department of a public administration consists of a simple keyword-like request for a map

generation, such as:“hydrography, Trento, January 2011”.

This request is a set of terms covering spatial (Trento)

and temporal (January 2011) aspects to be addressed

while looking for a specific theme, that is of

hydrogra-phy. Handling such a request involves interpreting at

run time the user query and creating an alignment between the relevant GI resources, such as those

having up to date (January 2011) topography and

hy-drography maps of Trento in order to ultimately com-pose these into a single one. Technically, alignments are used in such a setting for query expansion. For

what concerns thematic part, e.g., hydrography,

stan-dard matching technology can be widely reused [2, 32–34], while the spatial and temporal counterparts that constitute the specificity of GI applications have not received enough attention so far in the ontology matching field (with exceptions, such as [35, 36]), and hence, this gap will have to be covered in future.

4 R

ECENT MATCHING SYSTEMS

We now review several state of the art matching sys-tems (§4.1–§4.7) that appeared in the recent years and have not been covered by the previous surveys (§1).

Among the several dozens of systems that have ap-peared in these recent years, we selected some which (i) have repeatedly participated to the Ontology

Alignment Evaluation Initiative (OAEI) campaigns4

(see §5) in order to have a basis for comparisons and (ii) have corresponding archival publications, hence the complete account of these works is also available. An overview of the considered systems is presented in Table 1. The first half of the table provides a general outlook over the systems. The input column presents the input format used by the systems, the output column describes the cardinality of the computed alignment (see §2.2), the GUI column shows if a system is equipped with a graphical user interface,

(5)

System Input Output GUI Operation Terminological Structural Extensional Semantic SAMBO 1:1 Ontology n-gram, Iterative structural Naive Bayes

§4.1 OWL alignments Yes merging edit distance, similarity based on over -UMLS, WordNet is-a, part-of hierarchies documents

Falcon RDFS, 1:1 I-SUB, Structural Object

§4.2 OWL alignments - - Virtual proximities, similarity

-documents clustering, GMO

Tokenization, Rule-based

DSsim OWL, 1:1 AQUA Question Monger-Elkan, Graph similarity - fuzzy §4.3 SKOS alignments Q/A [31] answering Jaccard, based on leaves inference

WordNet

RiMOM 1:1 Edit distance, Similarity Vector

§4.4 OWL alignments - - vector distance, propagation distance -WordNet

Tokenization, Iterative fix point

ASMOV OWL n:m - - string equality, computation, Object Rule-based §4.5 alignments Levenstein distance, hierarchical, restriction similarity inference

WordNet, UMLS similarities Tokenization, Internal, external

Anchor-Flood RDFS, 1:1 - - string equality, similarities; - -§4.6 OWL alignments Winkler-based sim., iterative anchor-based

WordNet similarity propagation

XML, TF·IDF, Descendant,

AgreementMaker RDFS, n:m Yes - edit distance, sibling -

-§4.7 OWL, alignments substrings, similarities

N3 WordNet

TABLE 1: Analytical comparison of the recent matching systems.

and the operation column describes the ways in which a system can process alignments. The second half of the table classifies the available matching methods depending on which kind of data the algorithms work on: strings (terminological), structure (structural), data instances (extensional) or models (semantics). Strings and structures are found in the ontology descriptions, e.g., labels, comments, attributes and their types, re-lations of entities with other entities. Instances consti-tutes the actual population of an ontology. Models are the result of semantic interpretation and usually use logic reasoning to deduce correspondences. Table 1 illustrates particular matching methods employed by the systems under consideration. Below, we discuss these systems in more details.

4.1 SAMBO (Link ¨opings U.)

SAMBO is a system for matching and merging biomedical ontologies [37]. It handles ontologies in OWL and outputs 1:1 alignments between concepts and relations. The system uses various similarity-based matchers, including:

• terminological: n-gram, edit distance, comparison

of the lists of words of which the terms are composed. The results of these matchers are combined via a weighted sum with pre-defined weights;

• structural, through an iterative algorithm that

checks if two concepts occur in similar positions with respect to is-a or part-of hierarchies relative to already matched concepts, with the intuition that the concepts under consideration are likely to be similar as well;

• background knowledge based, using (i) a

relation-ship between the matched entities in UMLS (Uni-fied Medical Language System) [38] and (ii) a

corpus of knowledge collected from the pub-lished literature exploited through a naive Bayes classifier.

The results produced by these matchers are com-bined based on user-defined weights. Then, filtering based on thresholds is applied to come up with an alignment suggestion, which is further displayed to the user for feedback (approval, rejection or modifi-cation). Once matching has been accomplished, the system can merge the matched ontologies, compute the consequences, check the newly created ontology for consistency, etc. SAMBO has been subsequently extended into a toolkit for evaluation of ontology matching strategies, called KitAMO [39].

4.2 Falcon (Southeast U.)

Falcon is an automatic divide-and-conquer approach to ontology matching [40]. It handles ontologies in RDFS and OWL. It has been designed with the goal of dealing with large ontologies (of thousands of entities). The approach operates in three phases: (i) partitioning ontologies, (ii) matching blocks, and (iii) discovering alignments. The first phase starts with a structure-based partitioning to separate enti-ties (classes and properenti-ties) of each ontology into a set of small clusters. Partitioning is based on struc-tural proximities between classes and properties, e.g., how closely are the classes in the hierarchies of

rdfs:subClassOf relations and on an extension of the Rock agglomerative clustering algorithm [41]. Then it constructs blocks out of these clusters. In the second phase the blocks from distinct ontologies are matched based on anchors (pairs of entities matched in ad-vance), i.e., the more anchors are found between two blocks, the more similar the blocks are. In turn, the anchors are discovered by matching entities with the help of the I-SUB string comparison technique [42].

(6)

The block pairs with high similarities are selected based on a cutoff threshold. Notice that each block is just a small fragment of an ontology. Finally, in the

third phase the results of the so-calledV-Doc(a

linguis-tic matcher) andGMO(an iterative structural matcher)

techniques are combined via sequential composition to discover alignments between the matched block pairs. Ultimately, the output alignment is extracted through a greedy selection.

4.3 DSSim (Open U., Poznan U. of Economics)

DSSim is an agent-based ontology matching frame-work. The system handles large-scale ontologies in OWL and SKOS and computes 1:1 alignments with equivalence and subsumption relations between con-cepts and properties. It uses the Dempster-Shafer [43] theory in the context of query answering [44, 45]. Specifically, each agent builds a belief for the correct-ness of a particular correspondence hypothesis. Then, these beliefs are combined into a single more coherent view in order to improve correspondence quality. The ontologies are initially partitioned into fragments. Each concept or property of a first ontology fragment is viewed as a query, which is expanded based on hy-pernyms from WordNet [46], viewed as background knowledge. These hypernyms are used as variables in the hypothesis to enhance the beliefs. The expanded concepts and properties are matched syntactically to the similar concepts and properties of the second on-tology in order to identify a relevant graph fragment of the second ontology. Then, the query graph of the first ontology is matched against the relevant graph fragment of the second ontology. For that purpose, various terminological similarity measures are used, such as Monger-Elkan and Jaccard distances, which are combined using Dempster’s rule. Similarities are viewed as different experts in the evidence theory and are used to assess quantitative similarity values (converted into belief mass functions) that populate the similarity matrices. The resulting correspondences are selected based on the highest belief function over the combined evidences. Eventual conflicts among beliefs are resolved by using a fuzzy voting approach equipped with four ad hoc if-then rules. The system does not have a dedicated user interface but uses that of the AQUA question answering system [31] able to handle natural language queries.

4.4 RiMOM (Tsinghua U., Hong Kong U. of Sci-ence and Technology)

RiMOM is a dynamic multi-strategy ontology match-ing framework [47]. It extends a previous version of the system [48] that focused on combining multi-ple matching strategies, through risk minimization of Bayesian decision. The new version [47] quantitatively estimates the similarity characteristics for each match-ing task. These characteristics are used for dynamicly

selecting and combining the multiple matching meth-ods. Two basic matching methods are employed: (i) linguistic similarity (edit distance over entity labels, vector distance among comments and instances of en-tities) and (ii) structural similarity (a variation of Sim-ilarity Flooding [49] implemented as three simSim-ilarity propagation strategies: concept-to-concept, property-to-property and concept-property-to-property). In turn, the strategy selection uses label and structure similarity factors, obtained as a preprocessing of the ontologies to be matched, in order to determine what infor-mation should be employed in the matching pro-cess. Specifically, the strategy selection dynamically regulates the concrete feature selection for linguistic matching, the combination of weights for similarity combination, and the choice of the concrete similarity propagation strategy. After similarity propagation, the matching process concludes with alignment refine-ment and extraction of the final result.

4.5 ASMOV (INFOTECH Soft, Inc., U. of Miami)

ASMOV (Automatic Semantic Matching of Ontologies with Verification) is an automatic approach for on-tology matching that targets information integration for bioinformatics [50]. Overall, the approach can be summarized in two steps: (i) similarity calculation, and (ii) semantic verification. It takes as input two OWL ontologies and an optional input alignment and returns as output an n:m alignment between ontology entities (classes and properties). In the first step it uses lexical (string equality, a variation of Levenshtein distance), structural (weighted sum of the domain and range similarities) and extensional matchers to iteratively compute similarity measures between two ontologies, which are then aggregated into a single one as a weighted average. It also uses several sources of general and domain specific background knowl-edge, such as WordNet and UMLS, to provide more evidence for similarity computation. Then, it derives an alignment and checks it for inconsistency. Consis-tency checking is pattern based, i.e., that instead of using a complete solver, the system recognizes sets of correspondences that are proved to lead to an incon-sistency. The semantic verification process examines five types of patterns, e.g., disjoint-subsumption con-tradiction, subsumption incompleteness. This match-ing process is repeated with the obtained alignment as input until no new correspondences are found.

4.6 Anchor-Flood (Toyohashi U. of Technology)

The Anchor-Flood approach aims at handling effi-ciently particularly large ontologies [51]. It inputs ontologies in RDFS and OWL and outputs 1:1 align-ments. The system starts with a pair of similar con-cepts from two ontologies called an anchor, e.g., all ex-actly matched normalized concepts are considered as anchors. Then, it gradually proceeds by analyzing the

(7)

neighbors, i.e., super-concepts, sub-concepts, siblings, of each anchor, thereby building small segments (frag-ments) out of the ontologies to be matched. The size of the segments is defined dynamically starting from an anchor and exploring the neighboring concepts until either all the collected concepts are explored or no new matching pairs are found. The system focuses on (local) segment-to-segment comparisons, thus it does not consider the entire ontologies which improves the system scalability. It outputs a set of correspondences between concepts and properties of the semantically connected segments. For determining the correspondences between segments the approach relies on terminological (WordNet and Winkler-based string metrics) and structural similarity measures, which are further aggregated by also considering probable misalignments. The similarity between two concepts is determined by the ratio of the number of terminologically similar direct super-concepts on the number of total direct super-concepts. Retrieved (local) matching pairs are considered as anchors for further processing. The process is repeated until there are no more matching pairs to be processed.

4.7 AgreementMaker (U. of Illinois at Chicago)

AgreementMaker is a system comprising a wide range of automatic matchers, an extensible and modular architecture, a multi-purpose user interface, a set of evaluation strategies, and various manual, e.g., visual comparison, and semi-automatic features, e.g., user feedback [52]. It has been designed to handle large-scale ontologies based on the requirements coming from various domains, such as the geospatial and biomedical domains. The system handles ontologies in XML, RDFS, OWL, N3 and outputs 1:1, 1:m, n:1, n:m alignments. In general, the matching process is organized into two modules: similarity computation and alignment selection. The system combines match-ers using three laymatch-ers:

• The matchers of the first layer compare concept

features, such as labels, comments, instances, which are represented as TF·IDF vectors used with a cosine similarity metric. Other string-based measures, e.g., edit distance, substrings, may be used as well.

• The second layer uses structural ontology

prop-erties and includes two matchers called descen-dants similarity inheritance (if two nodes are matched with high similarity, then the simi-larity between the descendants of those nodes should increase) and siblings similarity contribu-tion (which uses the relacontribu-tionships between sibling concepts) [33].

• At the third layer, a linear weighted combination

is computed over the results coming from the first two layers, whose results are further pruned

based on thresholds and desired output cardinal-ities of the correspondences.

The system has a sophisticated user interface deeply integrated with the evaluation of ontology alignment quality, being an integral part of the matching process, thus empowering users with more control over it.

4.8 Analytical summary

The following can be observed concerning the consid-ered systems (§4.1–§4.7, see also Table 1):

• The approaches equally pursue the

develop-ment of generic matchers, e.g., Falcon, RiMOM, Anchor-Flood, as well as those focusing on partic-ular application domains, e.g., SAMBO, ASMOV, that target primarily biomedical applications.

• Most of the systems under consideration declare

to be able to handle efficiently large-scale ontolo-gies, i.e., tens of thousands of entities (see some experimental comparisons in §5). This is often achieved through employing various ontology partitioning and anchor-based strategies, such as in Falcon, DSSim or Anchor-Flood.

• Although all systems can deal with OWL (being

an OAEI requirement), many of them can be applied to RDFS or SKOS.

• Most of the systems focus on discovering 1:1

alignments, but yet several systems are able to discover n:m alignments. Moreover, most of the systems focus on computing equivalence rela-tions, with the exception of DSSim, which is also able to compute subsumption relations.

• Many systems are not equipped with a graphical

user interface, with several exceptions, such as SAMBO, DSSim, and AgreementMaker.

• Semantic and extensional methods are still rarely

employed by the matching systems. In fact, most of the approaches are quite often based only on terminological and structural methods.

• Many systems have focussed on combining and

extending the known methods. For example, the most popular of these are variations of edit dis-tance and WordNet matchers as well as iterative similarity propagation as adaptation of the Sim-ilarity Flooding algorithm. Thus, the focus was not on inventing fundamentally new methods, but rather on adapting and extending the existing methods.

5 R

ECENT MATCHING EVALUATIONS

We provide a comparative experimental review of the matching systems described previously (§4) in order to observe and measure empirically the progress made in the field. We base our analysis on the On-tology Alignment Evaluation Initiative (OAEI) and more precisely on its 2004–2010 campaigns [53–59]. OAEI is a coordinated international initiative that

(8)

organizes annual evaluations of the increasing num-ber of matching systems. It proposes matching tasks to participants and their results are evaluated with measures inspired from information retrieval. These are precision (which is a measure of correctness), recall (which is a measure of completeness) and F-measure, which aggregates them.

We consider here the three oldest test cases of OAEI in order to have a substantial set of data for comparison as well as diversity in tasks from automat-ically generated test cases to expressive ontologies. These are: benchmarks (§5.1), web directories (§5.2) and anatomy (§5.3). Participants were allowed to use one algorithm and the same set of parameters for all the test cases. Beside parameters, the input of the algorithms must be two ontologies to be matched and any general purpose resource available to everyone, i.e., resources designed especially for the test cases were not allowed, see for further details [53].

5.1 Benchmarks

The goal of the benchmark test case is to provide a stable and detailed picture of each matching algo-rithm. For that purpose, the algorithms are run on systematically generated test cases.

Test data.The domain of this test case is bibliographic references. It aims at comparing an OWL-DL ontology containing more than 80 entities with its variations. Most of the variations are obtained by discarding features of the original ontology. Other variations select either unrelated ontologies or other available ontologies on the same topic.

Fig. 3: Benchmarks: comparison of matching quality results in 2004–2010. More systems are mentioned in the figure with respect to those presented in §4. The results of these systems are given for the completeness of the presentation, see for further details [53, 59].

Evaluation results. A comparative summary of the best results of OAEI on the benchmarks is shown in Figure 3. edna is a simple edit distance algorithm on labels, which is used as a baseline. For 2004, we maximized the results of the two best systems Fujitsu and PromptDiff. The two best systems of the last several years are ASMOV [50] and Lily [60]. Their

results are very comparable. A notable progress has been made between 2004 and 2005 by Falcon; and the results of 2005 were repeated in 2009 by both ASMOV and Lily.

5.2 Directory

The directory test case aims at providing a challenging task in the domain of large directories constructed from Google, Yahoo and Looksmart web directories. These directories have vague terminology and mod-eling principles, thus, the matching tasks incorporate the typical uncontrolled open web modeling and ter-minological errors. The test case was built following the TaxMe2 methodology [61].

Test data. The data set is presented as taxonomies where the nodes of the web directories are modeled as classes and the classification relation connecting the

nodes is modeled as rdfs:subClassOf. There are more

than 4.500 node matching tasks, where each node matching task is composed from the paths to the root of the nodes in the web directories.

Fig. 4: Directory: comparison of matching quality results in 2006–2010. More systems are mentioned in the figure with respect to those presented in §4. The results of these systems are given for the completeness of the presentation, see for further details [53, 59].

Evaluation results. A comparison of the results in 2006–2010 for the top-3 systems of each year based on the highest F-measure is shown in Figure 4. A key observation is that from 2006 to 2007 we can measure a continuous improvement of the results, while in 2008 and 2009 the participating systems have either maintained or decreased their F-measure val-ues. The quality of the best F-measure result of 2009 (0.63) achieved by ASMOV is higher than the best F-measure of 2008 (0.49) demonstrated by DSSim [45]. It is higher than that of 2006 by Falcon (0.43). It equals to that of Prior+ [62] and is still lower than the best

F-measure of 2007 (0.71) by OLA2 [63].

5.3 Anatomy

The focus of this test case is to confront existing matching technology with expressive ontologies in the biomedical domain. Two of its specificities are the specialized vocabulary of the domain and the usage of OWL modeling capabilities.

(9)

Benchmarks (§5.1) Directory (§5.2) Anatomy (§5.3) Average System 2006 2007 2008 2009 2010 _±%B 2006 2007 2008 2009 2010 _±%D 2007 2008 2009 2010 _±%A ±% SAMBO (§4.1) 0.71 0.88 +24 0.82 0.85 +4 +14 Falcon (§4.2) 0.89 0.89 0.73 -18 0.43 0.58 +35 0.74 n/a +8 DSsim (§4.3) 0.70 0.77 0.92 0.91 +30 0.41 0.49 0.49 +19 0.20 0.62 0.75 +275 +108 RiMOM (§4.4) 0.92 0.91 0.96 0.96 0.91 -1 0.4 0.55 0.26 -35 0.48 0.82 0.79 +65 +10 ASMOV (§4.5) 0.92 0.96 0.96 0.93 +1 0.5 0.2 0.63 0.63 +26 0.75 0.71 0.75 0.79 +5 +11 Anchor-Flood (§4.6) 0.94 0.95 +1 0.77 0.75 -3 -1 AgreementMaker (§4.7) 0.93 0.89 -4 0.41 0.83 0.88 +115 +56

TABLE 2: The progress made by the considered systems over the recent years (2006–2010). For each year we report the F-measure indicator obtained by the systems on three test cases: benchmarks, directory and anatomy. The empty cells mean that the corresponding systems did not participate on a test case in a particular year. The ±% column stands for the progress/regress made over the years, calculated as a percentage increase between the first and the last participation, e.g., for SAMBO on benchmarks resulting in 0.71 + 24% ≈ 0.88. The last column shows the average progress made by the systems over different years on different test cases, calculated as the average over %B, %D, %A, e.g., for AgreementMaker this results in (−4 + 115)/2 ≈ +56%.

Test data. The ontologies are part of the Open Biomedical Ontologies (OBO) designed from the NCI Thesaurus (3304 classes) describing the human anatomy, published by the National Cancer Institute and the Adult Mouse Anatomical Dictionary (2744 classes). This test case has been used since 2007, while in 2005 and 2006 it was run on a different test data, which we do not consider here and focus on the more recent results instead.

Fig. 5: Anatomy: comparison of matching quality results in 2007–2010. More systems are mentioned in the figure with respect to those presented in §4. The results of these systems are given for the completeness of the presentation, see for further details [53, 59].

Evaluation results. A comparison of the results in 2007–2010 for the top-3 systems of each year based on the highest F-measure is shown in Figure 5.

We can make two key observations. The first one is that a baseline label matcher based on string equality already provides quite good results with F-measure of 0.76. The second one is that in all the years the best F-measure remained stable around of 0.86. However, some progress have been made in terms of efficiency, i.e., the run time reduced from days and hours to minutes and seconds. For example, the best runtime result of 15s in 2009 belongs to Anchor-Flood (its F-measure was 0.75). While in 2007 and 2008 the competition was clearly dominated by the AOAS [64] and SAMBO systems that were heavily exploiting background knowledge (UMLS); in turn, in 2009 the best result showed by Sobom [65] was obtained with-out using any background knowledge. Finally, in 2010 the best result was shown by AgreementMaker.

5.4 Experimental summary

As we can see from the previous subsections (§5.1– §5.3) various sets of systems participate on various test cases, but not necessarily on all of these. Not all the systems participated every year, which prohibits measuring comprehensively the progress of each sys-tem over the years. In Table 2, when available, we report the F-measure obtained by these systems and the respective progress or regress made. From Table 2 we conclude that:

• Individual systems improve over the years on the

same test cases. An exception includes RiMOM on the directory test case, what can be explained by the new release of the system, which still required tuning (see [47]).

• Better matching quality on one task is not

achieved at the expense of another task on the average.

• The overall average improvements made by the

individual systems on the test cases under con-siderations reach 108% increase or 28 percentage points (by DSSim) in the recent years.

An average progress over the OAEI participants that have been made in the three test cases considered from the early years to the recent years is of ∼30% in terms of F-measure (the average of all progression reported in Table 2), i.e., an increase of ∼10 percentage points on F-measure. Moreover, on the anatomy test case, the runtime improved 37 times on average from 692mn (about 11 hours) in 2007 to 18mn in 2009; see [53] for an in-depth discussion. Thus, measurable progress is observed in terms of effectiveness and efficiency made by the automatic ontology matching systems in the recent years.

At present, in the database community, there are no well-established benchmarks for comparing schema matching tools. However, there are many recent database schema matching tools and more generally model management infrastructures, e.g., COMA++ [4], AgreementMaker [52], GeRoMe [66], Harmony [67], that are able also to process ontologies, and hence, might be interested to test them within OAEI, as actually already happens, though modestly. On the other hand, OAEI has to consider including database schema matching tasks involving XML and relational schemas in order to improve the cross-fertilization between these communities.

(10)

6 T

OWARDS THE CHALLENGES

After years of work on ontology matching, several questions arise: is the field still making progress? Is this

progress significant enough to pursue further research? If

so, what are the particularly promising directions? The previous section showed that the field is indeed making measurable progress, but the speed of this progress is slowing down and becomes harder to determine. Also, the need for matching has risen in many different fields which have diverging demands, for instance, design time matching with correct and complete alignments, e.g., required when two banks merge, vs. run time matching with approximate align-ments, e.g., acceptable in query answering on the web [2]. This calls for more precise and more versatile evaluations of matchers.

The second question interrogates the significance of the obtained results. This question requires mea-surements as well: the OAEI evaluations measured the progress of ∼10 percentage points in the recent five-six years (§5.4). This is a sufficient result (com-pared to other fields of computer science) to peruse further research into it. Also, we can see from the benchmarks, that after a few years, the improvement rate is slowing down. Hence, in order to support a similar or a stronger growth in the forthcoming years some specific actions have to be taken. In particular, we propose to guide the evolution of the ontology matching field by addressing some specific challenges. With respect to the third question, we offer eight challenges for ontology matching (see Table 3). The challenges under consideration are among the major ontology matching topics of the recent conferences in semantic web, artificial intelligence and databases.

If the design of matchers consists of tuning further similarity measures or issuing other combinations of matchers, it is not to be expected a revolutionary progress, but most likely only an incremental one, as §5 also suggests. Other open issues are the com-putation of expressive alignments, e.g., correspon-dences across classes and properties [47, 68], oriented alignments (with non equivalence relations) [69, 70], or cross-lingual matching [71, 72]. Such issues are gradually progressing within the ontology matching field. In the first years of OAEI, it was not possi-ble to test such alignments because there was not enough matching systems able to produce them. Only recently, oriented matching datasets were introduced and there are more systems able to produce complex correspondences. Moreover, we consider these issues as too specific with respect to the other challenges discussed, so we did not retain them as challenges.

Breakthroughs can come from either completely different settings or classes of systems particularly adapted to specific applications. We can seek for such improvements from recovering background knowl-edge (§9), for example, from the linked open data

cloud as it represents a large and continuously grow-ing source of knowledge. Another source of quality gains is expected from the working environment in which matching is performed. Hence, work on involv-ing users in matchinvolv-ing (§11) or social and collabora-tive matching (§13) may provide surprising results. The challenges have been selected by focusing on pragmatic issues that should help consolidating the available work in the field, bringing tangible results in the short-medium period, thus, leaving most of the theoretical and less promising directions aside. For example, in [17] we also identified as challenges uncertainty in ontology matching [73, 74] and reason-ing with alignments [75, 76]. These are challengreason-ing theoretical issues, but they have a long term impact, hence, we do not discuss them here.

Another point worth mentioning is the rise of linked data [77] and the subsequent need for data interlinking. Ontology matching can take advantage of linked data as an external source of information for ontology matching, this is fully relevant to the “matching with background knowledge” challenge. Conversely, data interlinking can benefit from ontol-ogy matching by using correspondences to focus the search for potential instance level links [78]. OAEI since 2009 reacted to this need by hosting a specific instance matching track. However, data interlinking is a more specific topic, which is out of scope of this paper.

The challenges are articulated as follows. We start with the issue of evaluating ontology matching (§7), since this theme has had a large impact on the devel-opment of matchers in recent years and it shows their practical usefulness. Then, the next three challenges (§8–10) are concerned with creating better (more effec-tive and efficient) automatic matching technology and cover, respectively, such aspects as: efficiency of ontol-ogy matching techniques (§8), involving background knowledge (§9), matcher selection, combination and self-configuration (§10). These problems have become prevalent with the advent of applications requiring run time matchers. In turn, Sections 11–13 consider matchers and alignments in their relation with users and respectively cover how and where to involve users of the matching technology (§11), and what explanations of matching results are required (§12). Moreover, users can be considered collectively when working collaboratively on alignments (§13). This, in turn, requires an alignment infrastructure for sharing and reusing alignments (§14). Solving these problems would bring ontology matching closer to final users and more prone to fill their needs.

To understand better the most pressing issues for the different types of applications (§3), Table 3 crosses the challenges identified and two broad types of applications, i.e., those requiring design time and run time matching. Half of the challenges are largely important for both design and run time applications,

(11)

Challenges vs. applications Design Run time time

Large-scale evaluation (§7) √ Efficiency of ontology matching (§8) √ Matching with background knowledge (§9) √ √ Matcher selection and self-configuration (§10) √ √

User involvement (§11) √ √

Explanations of ontology matching (§12) √ Collaborative and social ontology matching (§13) √ Alignment infrastructure (§14) √ √

TABLE 3: Applications vs. challenges. The checkmarks indicate the primarily impact of the challenges under consideration on two broad types of appli-cations.

while the other half is primarily important either to one or another type of applications, thereby showing commonalities and specificities of these. For example, efficiency of ontology matching techniques is vital for run time applications, while involving background knowledge, matcher selection and self-configuration are crucial for improving quality (precision, recall) of matching results in both design and run time applications.

Each of the challenges is articulated in three parts: definition of the challenge, overview of recent ad-vances (that complement those discussed in §4), and discussion of potential approaches to tackle the chal-lenge under consideration.

7 L

ARGE

-

SCALE MATCHING EVALUATION

The growth of matching approaches makes the issues of their evaluation and comparison more severe. In fact, there are many issues to be addressed in order to empirically prove the matching technology mature and reliable.

The challenge. Large tests involving 10.000, 100.000, and 1.000.000 entities per ontology are to be designed and conducted. In turn, this raises the issues of a wider automation for acquisition of reference align-ments, e.g., by minimizing the human effort while increasing an evaluation dataset size.

We believe that the point of large-scale evaluation is of prime importance, though there are some other issues around ontology matching evaluation to be addressed as well:

• More accurate evaluation quality measures,

be-side precision and recall [2], are needed. Applica-tion specific measures have to be developed in order to assess whether the result of matching, e.g., F-measure of 40 or 80%, is good enough for a particular application, such as navigation among collections of masterpieces in the cultural heritage settings (§3) or web service matching. This should help quantifying more precisely the usefulness and differences between matching systems in some hard metrics, such as development time.

• Interoperability benchmarks, testing the ability

of exchanging data without loss of information

between the ontology matching tools, have to be designed and conducted.

• A methodology and test cases allowing for a

com-parative evaluation of instance-based matching approaches have to be designed and conducted.

Recent advances. OAEI campaigns gave some pre-liminary evidence of the scalability characteristics of the ontology matching technology. For example, most of the test cases of OAEI dealt with thousands of matching tasks, with an exception of the very large cross-lingual resources test case of OAEI-2008 [57]. Similar observations can be made as well with respect to individual matching evaluations [79–82].

Below we summarize the recent advances along the three previous issues:

• Initial steps towards better evaluation measures

have already been done by proposing semantic versions of precision and recall [83] implemented in [84] and in the Alignment API [85, 86]. In turn, an early attempt to introduce application specific measures was taken in the library test case (a variation of the cultural heritage case in §3) of OAEI-2008 [24]. This is similar to the task-based evaluation used in ontology learning [87, 88].

• Despite efforts on meta-matching systems, on

composing matchers [39, 89, 90], on the

Align-ment API [85] and in the SEALS5 _project

pro-moting automation of evaluations, in particular for ontology matching [91], the topic of interop-erability between matching tools remains largely unaddressed.

• The theme of a comparative evaluation of

instance-based matching approaches is in its in-fancy, some test cases that have been used in the past can be found in [92–95], while a recent ap-proach towards a benchmark for instance match-ing was proposed and implemented in OAEI-2009 and OAEI-2010 [58, 59, 96].

Discussion. A plausible step towards large-scale on-tology matching evaluation was taken within the very large cross-lingual resources test case of OAEI-2008. In particular, it involved matching among the following three resources: (i) WordNet which is a lexical database for English, (ii) DBPedia, which is a collection of “things”, each tied to an article in the English language Wikipedia, (iii) GTAA, which is a Dutch thesaurus used by the Netherlands Institute for Sound and Vision to index TV programs. The number of entities involved from each of the resources are: 82.000, 2.180.000 and 160.000, respectively. OAEI-2009 made one step further by having specific instance

matching track where the whole linked open dataset6

was involved.

5.Semantic Evaluation at Large Scale: http://www.seals-project.eu 6.http://www.linkeddata.org

(12)

Finding a large-scale real world test case is not enough for running an evaluation. A reference align-ment against which the results provided by matching systems has to be created. A typical approach is to build it manually, however, the number of possible correspondences grows quadratically with the num-ber of entities to be compared. This often makes the manual construction of the reference correspondences demanding to the point of being infeasible for large-scale matching tasks. A semi-automatic approach to the construction of reference alignment has been pro-posed in [61], which can be used as a starting point.

It remains difficult to know which matcher fits best to which task or application. To this end, the notion of hardness [97] for matching, identifying the degree of difficulty of a particular test would be useful. This would allow for automatically generating tests with particular characteristics of required hardness. This would also allow for defining test profiles (specify-ing dataset characteristics and measures) for different types of applications.

8 E

FFICIENCY OF MATCHING TECHNIQUES

Beside quality, the efficiency of matchers is of prime importance in dynamic applications, especially, when a user cannot wait too long for the system to respond or when memory is limited. Current ontology match-ers are mostly design time tools which are usually not optimized for resource consumption.

The challenge.The execution time indicates efficiency properties of matchers. However, good execution time can be achieved by using a large amount of main

memory, or bandwidth taking on par the other

com-putational resources, such as CPU.

Thus, usage of main memory should also be mea-sured or improved. Moreover, we can expect the need for matching on handheld computers or smartphones in the near future. In overall, the challenge is to come up with scalable ontology matching reference solutions.

Recent advances. As Section 4 indicates, the issue of efficiency was addressed explicitly by many recent systems. However, for instance, in the anatomy track of OAEI-2007 [56], only a few systems, such as Falcon (§4.2), took several minutes to complete this matching task, while other systems took much more time (hours and even days). In OAEI-2009, Anchor-Flood (§4.6) managed to solve it in 15 seconds. In the very large cross-lingual resources test case of OAEI-2008 only DSSim (§4.3) took part (out of 13 participants), though the input files were manually split into fragments and then the matching system was applied on the pairs of these fragments.

Discussion. Efficiency issues can be tackled through a number of strategies, including:

• parallelization of matching tasks, e.g., cluster

com-puting;

• distribution of matching tasks over peers with

available computational resources;

• approximation of matching results, which over

time become better, e.g., more complete;

• modularization of ontologies, yielding smaller

more targeted matching tasks;

• optimization of existing and empirically

proved-to-be-useful matching methods.

To our knowledge the first two items above re-main largely unaddressed so far, and thus, will have to be covered in future. There are tasks, such as matching very large cross-lingual resources of OAEI-2008, which the existing matching technology cannot handle automatically (the resources were too large). More computing power does not necessarily improve matching quality, but, at least at the beginning, it would accelerate the first run and the analysis of the bottlenecks. To this end, the approaches taken in the LarKC project [98] to realize the strategies mentioned previously (e.g., through divide-conquer-swap strategy, which extends the traditional approach of divide-and-conquer with an iterative procedure whose result converges towards completeness over time) should be looked for and adapted to ontology matching. The existing work mainly focused on the last three items. Below, we give insights on potential further developments of the themes of approximation, modularization and optimization.

The complexity of matching (in a pair-wise set up) is usually proportional to the size of the ontologies under consideration and the number of matching al-gorithms employed. A straightforward approach here is to reduce the number of pair-wise comparisons in favor of a (n incomplete) top-down strategy as implemented in QOM [99], or to avoid using com-putationally expensive matching methods, such as in RiMOM [47] by suppressing the structure based strategies and by applying only a simple version of the linguistic based strategies.

Another worthwhile direction to avoid exhaus-tive pair-wise comparisons, which appears to be particulary promising when handling large ontolo-gies, is based on segment-based approaches, e.g., COMA++ [79] and Anchor-Flood (§4.6), thus target-ing at matchtarget-ing only the similarly enough segments. This theme has to be further and more systemati-cally developed. It is also worth investigating how to automatically partition large ontologies into proper segments [40]. The efficiency of the integration of var-ious matchers can be improved by minimizing (with the help of clustering, such as in PORSCHE [100] and XClust [101]) the target search space for a source ontology entity.

Optimizations are worth performing only once the underlying basic techniques are stable. For example, in the case of S-Match [5, 102] the matching problem

(13)

was reduced to the validity problem for the propo-sitional calculus. The basic version of S-Match used

a standard satisfiability procedure of SAT4J7. Once

it has been realized that the approach is promis-ing (based on preliminary evaluations), the efficiency problems were tackled. Specifically, for some frequent practical cases (e.g., when the propositional formula, encoding a matching problem, appears to be Horn) satisfiability can be tested in linear time, while a stan-dard propositional satisfiability solver would require quadratic time [5]. Finally, the LogMap approach [103] used an incomplete reasoner as well as a number of optimizations to obtain the results faster, thereby exploiting several strategies to improve efficiency.

9 M

ATCHING WITH

BACKGROUND KNOWLEDGE

One source of difficulty for matching is that ontolo-gies are designed in a particular context, with some background knowledge, which often do not become part of the final ontology specification.

The challenge. Matching can be performed by dis-covering a common context or background knowl-edge for the ontologies and use it to extract relations between ontology entities. This context can take dif-ferent forms, such as a set of resources (web pages, pictures, etc.) which have been annotated with the concepts from an ontology, which provides common anchors to the ontologies to be matched. The difficulty is a matter of balance: adding context provides new information, and hence, helps increasing recall, but this new information may also generate incorrect, matches which decreases precision.

Recent advances. Various strategies have been used to deal with the lack of background knowledge. In particular:

• Declaring missing axioms manually as a

pre-match effort (Cupid [3, 6], COMA [104], a cluster-based approach proposed in [105]) or using par-tial input alignments (SAMBO [106]).

• Reusing previous matches (COMA++ [79]). More

generally storing and sharing existing alignments can be used for composing alignments, which helps solving part of the matching problem.

• Using the web as background knowledge [107],

and specifically, exploiting linked data as back-ground knowledge [108, 109] or the work on search engine weighted approximate match-ing [110].

• Using domain specific corpora (of schemas and

mappings) [107, 111] or schema covers [112];

• Using domain specific ontologies, e.g., in the field

of anatomy [64, 107], upper-level ontologies [113, 114], or all the ontologies available on the seman-tic web, such as in the work on Scarlet [115].

7.http://www.sat4j.org/

In addition, the work on S-Match [116] discussed an automatic approach to deal with the lack of back-ground knowledge in matching tasks by using seman-tic matching [117, 118] iteratively. On top of S-Match, the work in [119] discussed the use of UMLS, instead of WordNet, as a source of background knowledge in medical applications. Beef Food Agrovoc NAL TAP Beef MeatOrPoultry RedMeat Food ⊑ ⊑ ⊑ = = ⊑

Fig. 6: Use of background knowledge in Scarlet [115]. The process is made of two steps: (i) finding an ontology referring to the concepts to be matched, (ii) inferring a relation between these concepts in function of those of the background ontology.

The techniques mentioned above have helped im-proving the results of matchers in various cases. For instance, Figure 6 shows two entities from the

Agrovoc8 _{and NAL}9 _{thesauri that had to be matched}

in the food test case of OAEI-2007. When considering

conceptsBeefandFood, the use of background

knowl-edge found on the web, such as the TAP10 _ontology,

helps deduce thatBeef is less general thanFood. The

same result can be also obtained with the help of

WordNet sinceBeefis a hyponym (is a kind) ofFood.

Thus, multiple sources of background knowledge can simultaneously help.

Discussion. The techniques mentioned before can undergo different variations based on:

• the way background knowledge sources are

iden-tified to be useful, e.g., if there are enough entities in common for a particular matching task;

• the way background knowledge sources are

se-lected, i.e., given multiple sources, such as do-main specific ontologies and upper-level ontolo-gies, identified in the former step, selecting one or a combination of these to use;

• the way ontology entities are matched against

the background knowledge sources, e.g., by em-ploying simple string-based techniques or more sophisticated matchers;

• the way the obtained results are combined or

aggregated, e.g., by majority voting.

8.http://www.fao.org/aims/ag intro.htm 9.http://www.nal.usda.gov/

(14)

Once the necessary knowledge has been recov-ered, e.g., through a composition of several auxiliary resources, the issue is how to maintain it. Several alternatives can be explored, including: (i) extending (privately or locally) general purpose resources, such as WordNet or schema.org, towards specific domain knowledge, (ii) sharing the recovered knowledge (publicly) as linked open data.

The insights provided above have to be system-atically investigated, combined in a complementary fashion and evaluated. This is particulary important in dynamic settings, where the matching input is often shallow (especially when dealing with fragmented descriptions), and therefore, incorporates fewer clues. To this end, it is vital to identify the minimal back-ground knowledge necessary, e.g., a part of TAP in the example of Figure 6, to resolve a particular problem with sufficiently good results.

10 M

ATCHER SELECTION

,

COMBINATION AND TUNING

Many matchers are now available. As OAEI cam-paigns indicate (§5), there is no single matcher that clearly dominates others. Often these perform well in some cases and not so well in some other cases. Both for design and run time matching, it is necessary to be able to take advantage of the best configuration of matchers.

The challenge.There is evidence from OAEI (§5) that matchers do not necessarily find the same correct correspondences. Usually several competing matchers are applied to the same pair of entities in order to increase evidence towards a potential match or mismatch. This requires to solve several important problems: (i) selecting matchers and combining them, and (ii) self-configuring or tuning matchers. On top of this, for dynamic applications it is necessary to perform matcher combination and self-tuning at run time, and thus, efficiency of the configuration search strategies becomes critical. As the number of available matchers increases, the problem of their selection will become more critical, e.g., when the task will be to handle more than 50 matchers within one system.

Recent advances. The problem of matcher selection has been addressed, for example, through analytic hierarchy process [120], ad hoc rules [121, 122] or a graphical matching process editor [123]. Often the matcher selection is tackled by setting appropriate weights (in [0 1]) to matchers that are predefined in a pool (of usually at most several dozens of matchers) and to be further aggregated. So far, mostly design time toolboxes allow to do this manually [16, 79, 124].

Another approach involves ontology

meta-matching [89, 125], i.e., a framework for combining a set of selected ontology matchers. Instead of least-square linear regression as in [126], the work in [81]

uses a machine learning technique, called boosting (the AdaBoost algorithm) in order to select matchers from a pool to be further used in combination. Multi-agent techniques have also been used for that purpose, e.g., [127] exploits the max-sum algorithm to maximize the utility of a set of agents, while [128] uses argumentation schemes to combine matching results.

The work in [89] proposed an approach to tune a library of schema matchers at design time: given a particular matching task, it automatically tunes a matching system by choosing suitable matchers, and the best parameters to be used, such as thresholds. The work in [129] discussed consensus building after many methods have been used.

Discussion. The above mentioned problems share common characteristics: (i) the search space is very large, and (ii) the decision is made involving multiple criteria. Resolving these two problems simultaneously at run time makes ontology matching even harder.

The work on evaluation (§5) can be used in or-der to assess the strengths and the weaknesses of individual matchers by comparing their results with task requirements. Often, there are many different constraints and requirements applied to the matching tasks, e.g., correctness, completeness, execution time, main memory, thereby involving multi-decision crite-ria. The main issue is the semi-automatic combination of matchers by looking for complementarities, balanc-ing the weaknesses and reinforcbalanc-ing the strengths of the components. For example, the aggregation is usu-ally performed following a pre-defined aggregation function, such as a weighted average. Novel ways of performing aggregation with provable qualities of alignments have to be looked for in order to go beyond the incremental progress that we observed in the recent years. For example, one of the plausible directions to pursue was investigated in [130], which proposed to use a decision tree as an aggregation function, where the nodes represent the similarity measures and edges are used as conditions on the results. Such a decision tree represents a plan whose elementary operations are matching algorithms. Fur-ther issues to be addressed include investigating the automatic generation of decision trees based on an application domain.

In the web setting, it is natural that applications are constantly changing their characteristics. Therefore, approaches that attempt to tune and adapt automat-ically matching solutions to the settings in which an application operates are of high importance. This may involve the run time reconfiguration of a matcher by finding its most appropriate parameters, such as thresholds, weights, and coefficients. The above men-tioned work in [130], also contributed to the theme of tuning. Specifically, since edges in the decision tree are used as conditions, these can be viewed as thresholds,

(15)

personalized to each matcher. Thus, various ways of encoding the matcher combination and the tuning problem have to be explored and developed further.

11 U

SER INVOLVEMENT

In traditional applications, the result of matching performed at design time is screened by human users before being accepted. However, the overwhelming size of data may render this task difficult. In dy-namic applications, users are generally not ontology matching specialists who can be asked to inspect the alignments. Hence, in both cases, user involvement becomes crucial.

The challengeis to design ways of involving users so that they can help the matching process without being lost in the amount of results. The issue is, both for design and run time matching, to design interaction schemes which are burdenless to the user. At design time, interaction should be both natural and complete; at run time, it should be hidden in the user task.

Recent advances. So far, there have only been few studies on how to involve users in ontology matching. The works in [131, 132] proposed to use query logs to enhance match candidate generation. Several efforts were dedicated to design time matcher interaction, such as in [79, 133]. Some recent works have focussed on the ergonomic aspect of elaborating alignments, either for designing them manually or for checking and correcting them, e.g., through learning [134, 135]. Specifically, the work in [136] proposed a graphical vi-sualization of alignments based on cognitive studies. In turn, the work in [137] provided an environment for manually designing complex alignments through the use of connected perspective that allows for quickly deemphasizing non relevant aspects of the on-tologies being matched while keeping the connections between relevant entities. The work in [138] provided the Clip tool that allows for explicitly specifying struc-tural transformations by means of a visual language, in addition to value couplings to be associated to correspondences.

Discussion. With the development of interactive ap-proaches the issues of their usability will become more prominent. This includes scalability of visualiza-tion [139] and better user interfaces in general, which are expected to bring higher quality gains than more accurate matching algorithms [140].

An interesting trend to follow concerning user in-volvement relies on final users in order to learn from them – given a matching task – what is the best system configuration to approach that task. Moreover, for dynamic applications, only the final user can help. This can be exploited by adjusting matching system parameters (§10), or by experimenting with alignment selection strategies. In order to facilitate this, matching tools have to be configurable and customizable. Hence,

users themselves could improve these tools, thereby arriving to the exact solution that best fits their needs and preferences. When users are given this freedom by working on tool customization, they can also pro-vide useful feedback to system designers. Involving final users in an active manner in a matching project would increase its impact, as users who recognize the actual need, also have promising ideas on how to approach it [141]. When these “lead” users want something that is not available on the market, high benefits may be expected from such endeavors.

Technically, a basic premise underlying user in-teraction design is that users of a matching system should be able to influence the search for an optimal alignment on various levels via unified interfaces. For example, by recommending relevant background knowledge in advance, by influencing the selection and weighting of the various matching components, by criticizing aspects of intermediate results, and by determining whether the final result is good enough to be put to use. Little attention has been devoted so far to the realization of interfaces that actually allow users to become active in these ways. Systems should be developed on the basis of continual tests with final users, and the ultimate success criterion will be the extent to which the system has value for them.

Finally, as more systems will become equipped with GUIs (see Table 1), we expect that evaluation of usability and customizability of such systems will become more prominent, e.g., included as evaluation indicators of the OAEI campaigns.

12 E

XPLANATION OF MATCHING RESULTS

In order to better edit alignments, thereby providing feedback to the system, users need to understand them. It is often not sufficient that a matcher returns an alignment, for users to understand it immediately. In order for matching systems to gain a wider accep-tance and to be trusted by users, it will be necessary that they provide explanations of their results to users or to other programs that exploit them. Notice that the issues of trustworthiness and provenance become particularly important in the web settings that enable social and collaborative matching (§13).

The challengeis to provide explanations in a simple, yet clear and precise, way to the user in order to fa-cilitate informed decision making. In particular, many sophisticated techniques used by matching systems, e.g., machine learning or discrete optimization, do not yield simple or symbolic explanations.

Recent advances. There are only a few matching systems able to provide an explanation for their re-sults [128, 142, 143]. The solutions proposed so far focus on default explanations, explaining basic match-ers, explaining the matching process, and negotiating alignments by argumentation.