• No results found

Comparative Evaluations

Comparative evaluations include the CUPID [88] evaluation and the evaluation by Madhavan et al. in [92]. Furthermore, we could identify several works, in particular, the S-MATCH evaluations [3, 53, 54] and three other evaluations [131, 87, 39], comparing their own approaches with our first COMA prototype [29], which is now significantly out- performed by COMA++ in terms of both quality and execution time (see Chapter 11). Although aiming at a uniform comparison of several approaches, comparative evalua- tions still depend very much on the subjectivity of the conducting authors. Furthermore, the lack of detailed knowledge about tuning capabilities of others’ tools may lead to their suboptimal results. These effects can be reduced to a large extent, as done by the EON Ontology Alignment Contest [44, 48], by requiring the tool authors themselves to uni- formly perform the evaluation on an independently developed test base.

CUPID vs. DIKE and MOMIS

The CUPID evaluation represents a pioneer of this kind. In [88], the authors compared the quality of CUPID with two previous prototypes, DIKE and MOMIS, which had not been evaluated so far. Some pre-match effort was needed to specify domain synonyms and

13.2.CO M P A R A T I V E EV A L U A T I O N S 1 4 5

abbreviations in the required format for each system. First, the systems were tested with some canonical match tasks considering very small schema fragments. Second, the sys- tems were tested with 2 real-world XML schemas for purchase order, which is also the smallest match task in the first evaluation of COMA [29] and in the COMA++ evaluation (see Section 11.1). The authors then compared the systems by looking for the correspon- dences which could or could not be identified by a particular system. CUPID was able to identify all necessary correspondences for this match task, and thus showed a better qual- ity than the other systems. In the entire evaluation, no quality measures were computed and no execution times of the systems were reported.

Corpus-based Matching vs. GLUE and MKB

In [92], the corpus-based match approach was compared with two other algorithms, GLUE [35] and MKB [91], all developed by the same authors. In particular, MKB is a pre- liminary version of the corpus-based approach, while GLUE performs matching directly between input schemas but not indirectly via a corpus. The three algorithms were uni- formly tested with 4 different domains, each of which consists of a high number of sche- mas (26-34). However, the schemas were rather small (with 7-41 elements). For all algorithms, pre-match effort was required to specify domain constraints (match and mis- match rules) and to train the learners. The evaluation employed three quality measures, Precision, Recall, and Fmeasure, and did not consider execution time. The corpus-based approach was shown to outperform the other two (see Figure 13.3). However, it was unclear if a systematic evaluation was performed to obtain the best quality for each algo- rithm and how different configuration parameters (learner combinations, strategies for selecting similar elements, constraint usage) influence the quality of the single algo- rithms. Like LSD, GLUE and IMAP, the corpus-based approach achieved on average Pre- cision and Recall of around 0.8. The authors further evaluated the corpus-based algorithm combined with the structure matcher of SIMILARITYFLOODING. While signifi- cantly improving the quality of the standalone SF algorithm, the combination of two approaches performs worse than the standalone corpus-based algorithm.

Figure 13.3 Corpus-based (Augment) vs. GLUE (Direct) and MKB (Pivot) [92]

S-MATCH vs. COMA, CUPID, and SIMILARITYFLOODING

S-MATCH was compared with COMA, CUPID, and SF in three different evaluations, which are described in [53], [54], and [3], respectively. The first evaluation [53] consid- ered not only quality but also execution time of each prototype. However, it was based on only three simple match tasks with average schema size of 5, 10, and 30 elements, respectively. While S-MATCH depends on oracles, i.e auxiliary sources, it was unclear which ones were used and if the same information was also provided for the other proto- types. Match quality was measured using four measures Precision, Recall, Fmeasure and

Overall. S-MATCH was reported to achieve on average a slightly better quality than COMA, which in turn outperformed the remaining prototypes. However, while S-MATCH was tested with different system configurations (e.g., directions for match candidate selection), it was unclear which configurations were tested for the other prototypes. As for execution time, S-MATCH performed significantly slower than the other systems. After some performance optimizations of S-MATCH, the second evaluation [54] only focused on execution time and completely ignored quality aspects. For taxonomies with hundreds of nodes, S-MATCH was shown to perform faster than COMA but still slower than SF. The third evaluation [3] compared S-MATCH with COMA, however, only in terms of Recall, by applying them on very large taxonomies with several hundreds of thousands of nodes. Using the default matcher combination, COMA showed a signifi- cantly better Recall than S-MATCH, which was then optimized with some heuristics directly derived from its missed matches to achieve a better Recall than COMA. Unfortu- nately, no Precision values were reported and no time studies were conducted.

Other Approaches vs. COMA

The evaluation in [131] compared the credibility-based approach for aggregating matcher-specific similarities with the average and meta-learning method, which are sup- ported by COMA and LSD, respectively. The test schemas and real results were taken from the COMA evaluation (i.e., the Small series - see Section 11.1). The evaluation per- formed only one measurement using a default system configuration, making it impossi- ble to assess the overall quality behavior of an approach. Considering the small problem size, only insignificant quality differences between the approaches (<2% of average Fmeasure) were observed. Only a subset of matchers of COMA was involved, leading to a much lower quality, Fmeasure ~0.7 and Overall ~0.4, than reported in [29] for the no- reuse case, namely, Fmeasure 0.85 and Overall 0.73.

In [87], Lu et al. compared their approach with the tree edit distance matching algorithm of Zhang et al. [143] and COMA, respectively, using the PO schemas and real results taken from the COMA evaluation (i.e., the Small series). While clearly outperforming the tree edit distance algorithm, their algorithm was reported to achieve a small improve- ment by 2% over the average Overall 0.73 reported in [29] for the no-reuse matchers of COMA. However, it was unclear if a systematic evaluation was performed and which configuration of their algorithm achieved this quality. In addition to match quality, a time comparison was also performed, but only for the own and tree edit distance algo- rithm. However, the matching time of the own algorithm was measured without consid- ering the expensive preparing phase (import, name/node similarity computation). It is unclear if such preparation could be of any benefit for the tree edit distance algorithm. In addition to the PO schemas taken from the COMA evaluation, Dragut and Lawrence utilized a manually constructed global ontology in their evaluation [39]. The quality of their reuse approach (automatically matching against the global ontology and composing the match results) was compared with that of using COMA to match the schemas directly, but not with the reuse approach of COMA [29]. It was unclear, which configuration (matchers, combination strategies) was used for their implementation and COMA, respec- tively, in the comparison. While some schema-to-ontology mappings (obtained using the noMax strategy) exhibited even negative Overall values, it is unclear how the composi- tion of the mappings could yield high positive Overall in all tasks. No average quality was reported for an overall comparison of the tested approaches.

13.3.EV A L U A T I O N CO M P A R I S O N 1 4 7