2 State of the Art – Ontology Mapping
2.7 User Evaluation of Mapping Tools
There has been a distinct lack of user studies carried out on mapping tools to date. The following sections present five published user evaluations of mapping tools. Each of the evaluations only considers mapping with one-to-one correspondences.
2.7.1 PROMPT User Study
A user study was conducted on PROMPT [Noy 2002]. The experiment involved the users downloading the tool, reading through the documentation and tutorial example given, completing a set of tasks and emailing the author the results of the tasks. The experiment concentrated on evaluating the correspondence suggestions provided by the tool and did not evaluate the usability, user’s performance of generating correspondences and user’s experience with the tool. The experiment also only involved four users and the author states “The number
12
http://www.flickr.com/ 13
49
of users was still too small and the variability in user’s expertise with Protégé too large to get meaningful estimates”. This suggests the number of users is too small to gain any significant conclusions from the results.
2.7.2 PROMPT and Chimaera User Study
Lambrix and Edberg [Lambrix 2003] performed a user evaluation of PROMPT and Chimaera for the specific use case of merging ontologies in bioinformatics. The user experiment involved eight users, four with computer science backgrounds and four with biology backgrounds. Participants were given a set number of mapping tasks to perform. A user manual was given to the user and the software help system was available to support the user while they performed the mapping tasks. The experiment was an observational case study and participants were instructed to “think aloud” while an evaluator took notes. Afterwards the users were instructed to complete a questionnaire about their experience. The user interfaces were evaluated using the REAL (Relevance, Efficiency, Attitude and Learnability) approach [Lowgren 1993]. PROMPT outperformed Chimera, however participants found learning how to merge ontologies with each tool difficult. The participants also found it particularly difficult to perform any non-automated procedures in PROMPT, such as creating user-defined merges. The author mentioned they investigated the difference in the user interface evaluation between the two groups of users and found “there was no significant difference between the results of these two groups”. This suggests that the results of the user groups were to the same standard. However, in the evaluation provided there was no analysis of the user’s performance with the mapping tools. Instead the evaluation just provided analysis on the quality of candidate correspondences made by each tool rather than the correspondences generated by the users with the mapping tools.
2.7.3 Visualisation Mapping User Study
Robertson et al. [Robertson 2005] evaluated their novel approach to visualisation of mappings, which used several standard visualisation techniques to improve the interface of mapping tools see section 2.5.2, with a user-study. The Biztalk mapping tool was used as the baseline for the evaluation, as the prototype was built on top of Biztalk. The prototype allowed several of the visualisation techniques to be turned off which allowed for four different versions of their prototype to be tested against BizTalk, to assess the impact of each feature. The user experiment involved eight users, all of which had experience using BizTalk. The experiment was an observational case study and the participants were required to complete a user satisfaction questionnaire and interview after the experiment. The results revealed the new feature set received significantly higher ratings. In this experiment the efficiency of the user performance
50
was evaluated by analysing the mapping task time. However, the effectiveness of the user performance, the quality of the mappings generated, was not evaluated. Also the usability of the mapping tools was measured with a customised questionnaire.
2.7.4 Schema Mapper User Study
Schema Mapper [Raghavan 2005] was evaluated against MapForce14, which is an XML mapping graph-based tool, in a user-study. The user experiment involved nine participants, five of whom had previous experience with XML editing. Each user was given instructions on how to use each tool. The users were also given additional reading material for both tools and asked to explore the tools until they felt comfortable. The experiment was an observational case study. The results revealed that users found it easier to locate nodes with Schema Mapper as there was not as much scrolling or searching with the hyperbolic tree compared to the graph-based interface of MapForce. Also the recommendations from Schema Mapper helped navigation of the mappings. The users found Schema Mapper easier to use as they found confusing the lines that MapForce used to denote mapping across the screen. Finally, users did not find it confusing to view just the local view of the term when mapping in Schema Mapper and actually preferred it and would use this view again for mapping. Once again only the efficiency of the user performance was evaluated by the time taken to perform a mapping task.
2.7.5 CoGZ and PROMPT User Study
CoGZ was evaluated against PROMPT in a user-study [Falconer 2009]. The user experiment involved 18 participants, all recruited from the University of Victoria computer science department15. The experiment was an observational case study and participants were instructed to “think aloud” while an evaluator took notes. For the experiment each user would attempt to complete 9 mapping tasks, each a different level of task, with both tools. The procedure was the following: first the user received training with the tool, then they practiced using the tool with a different set of ontologies, next they answered the set of mapping tasks for the tool, and finally they completed a SUS questionnaire to measure the user satisfaction. The tools were used in different orders and an interview was conducted with the user after they were finished using both tools. Both the efficiency and effectiveness of the user performance was evaluated for both mapping tools. The results revealed that the CoGZ tool improved the mapping performance of the users and had better user satisfaction response from the users.
14
http://www.altova.com/mapforce.html 15
51
2.7.6 Summary
Table 2-6 displays a summary of evaluation methods used by the experiments detailed in the previous section.
Table 2-6: Summary of evaluation methods in mapping tool user studies
PROMPT PROMPT & Chimaera Visualisation Mapping Schema Mapper CoGZ & PROMPT Background of Participants
Medical Medical & Computer Science Computer Science Computer Science Computer Science Participants’ Mapping Experience
Novice Novice Expert Novice & Expert
Novice & Expert
Cumber of Users 4 8 8 9 18
Experiment Type Field Controlled Controlled Controlled Controlled
Experiment Protocol
None Think-aloud Observational Observational Think-aloud
Focus of Experiment
Tool Tool User Tool User
Group Compareᵃ No Yes No No No
Statistical Tests Used
No No ANOVA No t-Test
Questionnaire Used
None Yes Yes None SUS
Interviews No No Yes Yes Yes ᵃThe user study compares the mapping results of different groups of users Only one experiment occurred in the user’s work environment. However there was no evaluation on the user performance or feedback from the users using the tool in this experiment. The others occurred in a controlled lab environment where an evaluator observed the participants interacting with the mapping tools. Two of the experiments required the user to think-aloud while performing mapping tasks. Using the think-aloud protocol has several problems such as unease on the part of the participant on having to speak and if there is silence prompting the participant to speak may become overbearing which can affect the participant’s performance16. In the author’s opinion it may be better for controlled lab experiment to be observational and try to replicate the natural setting of the user’s work environment. The majority of the experiments in this section focused on measuring the performance of the mapping tool rather than the user’s performance with the mapping tool. Table 2-7 displays a summary of the areas of user evaluation and metrics, used by the experiments referred to in this section.
16
52
Table 2-7: Summary of Mapping Tool Evaluation Areas
PROMPT PROMPT & Chimaera Visualisation Mapping Schema Mapper CoGZ & PROMPT
Efficiency No Questionnaire Task Time Task Time Task Time
Effectiveness No No No No Gold Standard
User Satisfaction No Questionnaire Questionnaire & Interview Interview Questionnaire & Interview Accessible No No No No No Convenience No No No No No
Simpleᵃ No Questionnaire Interview Interview Interview ᵃEvaluates how easy the goal of generating mapping is to accomplish for users The majority of the experiments measure the efficiency of users generating mappings with the mapping tool and use the time taken to complete a mapping task as the metric. Only one of the experiments measured the effectiveness of users generating mappings with the mapping tool by evaluating the quality of the mappings generated by the participants against a gold standard17. Questionnaires and interviews were used in the experiments to measure the user’s satisfaction. Specifically how simple a mapping task was to perform was judged by the responses given to questionnaires and comments given in the interviews. None of the experiments evaluated if the mapping tools were accessible or convenient to use.
User experiments for mapping tools are necessary for the improvement of mapping tools [Bernstein 2007] [Falconer 2009] [Shvaiko 2008]. However other than the few examples in this section, there have been little user evaluation published on ontology mapping tools. Also these user experiments evaluated just the usability of the mapping tools and did not evaluate the accessibility or convenience of using the tools in the users computing environment. Moreover, there is no standard defined benchmark on how to perform user evaluation of mapping tools. A proposed mapping evaluation benchmark STBenchmark [Bogdan 2008] presents a simple usability model to evaluate the usability of mapping tools through the effort the user exerts, through the number of keystrokes and buttons pressed, but this model still lacks necessary feedback from users. For example it is important to measure the cognitive load of the task which will not be captured with the number of keystrokes and buttons pushed. Finally, the majority of the user experiments that have been carried out have been done in a laboratory setting and only one published study has been a field test conducted in the user’s computing environment. If mapping usage is to increase user experiments need to be evaluated in their own work environment to gain an understanding of the user behaviour with the tool in a non laboratory setting.
17
A Gold Standard is the optimum mapping between two ontologies generally constructed by a knowledge engineer and is used to measure the quality of other mappings between the two ontologies
53