Chapter 5 – Evaluation
5.3 Methodology
5.3.2 Evaluation Techniques
Given their unbiased position, evaluation participants are uniquely placed to provide valuable feedback which can be used to evaluate the effectiveness of LAVA. As such their responses to interactions with the system formed a large part of the evaluation process. With such a strong focus on obtaining information about participants’ interactions with LAVA it was important to minimise any reactivity, experimenter or observer effects [244-246] introduced during the evaluation sessions which could adversely impact the internal validity of the evaluation process. The first approach adopted to achieve
this was the application of several evaluation techniques at each stage of the process, as described below
• Questionnaires: A summary of the properties offered by questionnaires is provided in
Appendix G.1. With respect to the user evaluation process, anonymised questionnaires were used at the start of each session in order to obtain a standardised set of demographic data for the purpose of characterising the participant population. Within each session, questionnaires were used to allow participants to consider their responses without interference or feedback from evaluators. Comprising a mixture of multiple choice and open ended unstructured questions, questionnaires were used to solicit participant responses relating to specific components within the system. This approach allowed multiple datasets to be obtained and analysed in a uniform way, thereby providing detailed evaluation of several areas of the system. It was also an ideal way of soliciting and exploring user perceptions with regards to the usability of the software and the perceived realism of the archaeological excavation scenarios presented. Owing to the difficulty associated with analysing the responses of open ended questions [247], their use was minimised within questionnaires, with alternative techniques being used to solicit more detailed participant responses.
• Structured Interviews [248] were used to explore, in more detail, specific areas of LAVA
that had been highlighted in group evaluation sessions or responses to participant questionnaires. Given the properties of the interviewing process, as summarised in Appendix G.2, the emphasis that structured interviews place on active participant involvement was used as an opportunity to solicit participant opinions regarding the user experience provided by LAVA. Participants were asked open ended questions, with the evaluator following up on initial responses in order to gain detailed insight into user behaviour whilst using the system. On some occasions, participants were interviewed whilst actively engaging with a LAVA simulation. The interviewing process helped to provide an overview of the system and highlighted areas that participants perceived to be performing well and areas that were perceived to be lacking. This feedback was used to determine future priorities during the development process, as well as being fed back into the evaluation process to ensure problem areas were re-examined following updates and enhancements applied during the development cycle.
• Individual and Group Observation techniques, as summarised in Appendix G.3, were used
to analyse the way in which particular tasks within LAVA were approached by users. By following participants’ progress through excavation scenarios, valuable usage data was obtained by the evaluation process. As with structured interviews, this data was able to highlight areas of LAVA that were performing well and areas that required further development work. When undertaking individual observations, talk aloud protocols [249] [250] were used to encourage participants to verbalise their thought processes whilst
interacting with LAVA. Conversation [251] and discourse analysis[242, 243] were both used in group settings, again to analyse how participants’ approached each excavation scenario. Unlike talk aloud protocols, discourse analysis does not require participants to actively verbalise their actions as they work, but instead relies on analysing the natural communication which occurs when cooperating within group environments. In this way discourse analysis is less intrusive and less likely to introduce experimenter or Pygmalion effects [252, 253]. However, as participants are not actively verbalising their actions, analysing data obtained through discourse analysis is more time consuming owing to the need to infer meaning from user behaviour, something which would otherwise have been made explicit by participants if they were using talk aloud protocols.
• Closely linked to individual and group observation, Co-Participation techniques [254], the properties of which are summarised in Appendix G.4, were used in group evaluation sessions to encourage participants to verbalise to fellow group members, in a natural way, their interactions with LAVA. By encouraging users to verbally dissect their rationale and approach, co-participation provides evaluation data outlining the ways in which group members organise their interactions with the system. Through verbalising their thought processes, users are able to reveal to the evaluation process the aspects of LAVA which support and encourage their engagement with the system as well as the aspects which cause confusion, frustration or difficulty when engaging with the excavation process.
• Written records were also maintained for each evaluation session by domain experts, tutors
delivering the archaeology content in the session, LAVA developers and evaluators. This approach was adopted in order to provide a review of each evaluation session from several perspectives, allowing each to be analysed and considered in the evaluation of the overall suitability of the framework. The soliciting of several different perspectives also added clarity to the evaluation process, with feedback from domain experts and course tutors helping to shape the timetables followed in future evaluation sessions.
In an effort to encourage frank and truthful responses, in all evaluation activities a separation between the evaluator and the LAVA software was emphasised to participants. It was also made clear to participants that their actions were not being evaluated, but that it was LAVA and the way the software handled participant interactions that was of interest in the evaluation process. This separation was adopted to minimise the possibility of the Pygmalion effect [252] affecting participant behaviour. To further reduce the risk of observer effects, standardised documentation was used throughout each evaluation session. In this way experimenter bias [255] could be more carefully managed, with all session instructions being carefully drafted to ensure an unbiased approach throughout each of the sessions. Not only did this approach ensure that participants were not given any indicators as to how
they should respond during the evaluation process, but it also facilitated comparisons between evaluation sessions owing to the standardised organisation and timing allocated to each activity.
When considering structured interviewing, experimenter effects were more difficult to manage owing to the interactive nature of the interviewing process. In an effort to minimise the possibility of introducing bias, interviewees were encouraged to be the most active participant within the session, with interviewers acting as a facilitator, introducing topics and prompting for further detail from the interviewee when appropriate. In recognition of the fact that the interview process was more susceptible to experimenter effects than other aspects of the evaluation process, interview data was used primarily to corroborate and add context to data obtained through other channels, thereby reducing the possible impact of experimenter bias on the validity of the evaluation process.