Chapter 8: Combined Field Study
8.2.1. Approach
IV evaluation provides various challenges (Riche, 2010), but there is a need for alternative evaluation methods (Shneiderman and Plaisant, 2006). Although controlled experiments, such as usability evaluations, are useful for identifying usability problems, these experiments do not support evaluating the effectiveness of data exploration and are also not specifically focussed on IV evaluation (Riche, 2010; Shneiderman and Plaisant, 2006). Eye tracking is commonly used to support these usability evaluations especially for providing a better understanding of exploration approaches (Goldberg and Helfman, 2010), but the evaluations are still controlled.
Various methods have been developed and/or applied for the specific purpose of evaluating IV techniques and tools. These methods include grounded theory and/or evaluation, focus groups, crowdsourcing and a customised case study approach.
8.2.1.1. Grounded Theory / Evaluation
Grounded evaluation makes use of qualitative analysis to make sure that the actual, further evaluation of IV techniques and tools considers the intended use of these IV techniques and tools (Isenberg et al., 2008). The process of this evaluation method includes understanding the context of the intended use of the proposed IV techniques and tools in terms of data, tasks, current techniques used and the process of its use, and then deriving the initial design, identifying which interim evaluation methods would be appropriate and identifying evaluation criteria from this. This evaluation method is used to develop theory based on subjective experiences from users and is user-centred (Faisal et al., 2008). Grounded evaluations are targeted towards the early stages of development of IV techniques and tools (Isenberg et al., 2008). Thus, at this stage of this research, an alternative and more suitable evaluation approach is needed.
8.2.1.2. Focus Groups
Focus group evaluations are used to gather qualitative data and identify unexpected problems (Mazza and Berre, 2007). Focus group interviews are conducted by a facilitator, making use of open-ended questions, with a group of users whereby the users identify any concerns about the IV tool. The facilitator provides questions to the users relating to the usefulness of the system and
cognitive tasks. Advantages of focus groups include the possibility of simultaneously capturing a number of user perspectives and therefore users can feed off other users’ comments and ideas. A shortcoming of this evaluation method is that the users do not actually interact with the IV techniques and tools themselves, and thus may not be able to identify problems and features that they would not be able to identify without interacting with the actual system (Kinnaird and Romero, 2010).
8.2.1.3. Crowdsourcing
A recent development in IV evaluation involves crowdsourcing (Heer, 2010). Crowdsourcing provides an alternative lightweight approach to IV evaluation, where participants are gathered online to complete small tasks with an IV technique or tool. Potential crowdsourcing advantages have been identified for evaluating IV techniques and tools, including reducing evaluation costs and conducting more practical evaluations. Unfortunately, this evaluation method suffers from a number of challenges and shortcomings, such as the unreliability of participants and results, and crowdsourcing has not yet been widely applied in the IV field. Additionally, specific to this field study, the mobile application needed to be installed on each tablet device, thus crowdsourcing for the tablet version was not possible.
8.2.1.4. Multi-dimensional In-depth Long-term Case Studies Approach
The Multi-dimensional In-depth Long-term Case Studies (MILC) approach makes use of observations, interviews, questionnaires and system logging to evaluate performance and user interface (UI) efficiency and utility (Shneiderman and Plaisant, 2006). This evaluation method focusses on using case studies to gather detailed results from a few users making use of their own data with an IV technique or tool within their own environment, i.e. “in the wild”, over an extended period of time. The MILC approach follows an ethnographical approach whereby IV designers collaborate with expert users to analyse the expert’s own data over a period of time (Riche, 2010).
From the above evaluation methods, it can be concluded that the MILC evaluation approach provides the closest evaluation method to the requirement of evaluating “in the wild” in terms of a field study. Additionally, the literature that exists relating to the MILC evaluation approach is extensively detailed and can be easily applied and replicated within this research. Although the qualitative evaluation methods such as the MILC and grounded evaluation suffer from limitations such as the time required to capture and analyse data, bias possibly introduced by the observer
and the difficulty in reproducing and generalising results (Riche, 2010), these limitations can be addressed so that they are avoided.
8.2.2. Aims and Objectives
The aim of the field study was to determine the usefulness of the IV techniques incorporated in the MyPSI prototype in supporting access to personal information (PI) across multiple devices, specifically on desktop and tablet devices, over a two-week period, due to time constraints. The field study was also used to analyse subjective experiences when using the MyPSI prototype.
8.2.3. Participants
Each participant of the field study completed an electronic biographical questionnaire similar to the preliminary user study (Appendix D). The requirements of participants for the field study included that the participants needed to currently be managing PI across at least two devices. Additionally, each participant was required to be available for one of the available two-week periods.
The field study was conducted with students and staff from the Department of Computing Sciences and the School of Information Communication and Technology (ICT) at the Nelson Mandela Metropolitan University (NMMU). Initially, 20 participants agreed to participate in the field study, although only 13 participants completed the entire field study. Table 8-1 outlines the distribution of the participants in terms of completion.
Table 8-1 Distribution of Participants in Terms of Completion
Number of Participants
Completed 13
Started but did not finish 3
Did not start 4
Total 20
Three participants commenced the evaluation and participated in the first few days of the field study, but two participants withdrew from the study due to work commitments. One participant withdrew from the field study after completing the evaluation for the desktop device as s/he perceived that the MyPSI prototype contained several usability problems and thus did not wish
to continue. Upon inspection, it was determined that this participant did not view any of the video tutorials beforehand and so did not know what the proposed interaction was for each function within the prototype. Four participants completed the informed consent form as well as the biographical questionnaire, but did not complete any of the tasks or log book diaries and then subsequently withdrew from the study.
Participants were in the age range of 21-49 years of age. Most participants (10) had at least 10 years computing experience, while three participants had 6-9 years computing experience. Figure 8-1 shows that most participants (9) managed their PI daily, where the remaining participants (4) managed their PI weekly using a digital device.
Figure 8-1 Frequency in Managing PI Using a Digital Device (n=13)
Figure 8-2 depicts the number of devices each participant use to manage PI as well as which device each participant considers as their main device to manage PI. Nine participants use four devices to manage their PI, while two participants manage their PI across three devices and the remaining two participants use two devices to manage their PI (Figure 8-2a). Most participants (8) consider their desktop personal computer (PC) as their main device to manage PI, while the remaining participants consider their laptop (4) and mobile phone (1) as their main device (Figure 8-2b).
9 4
Frequency in Managing PI using a Digital Device (n=13)
Daily Weekly
(a) (b)
Figure 8-2 (a) The Number of Devices used to Manage PI and (b) the Main Device for Managing PI
8.2.4. Evaluation Metrics
The MILC approach makes use of various means to capture data within an evaluation. These means include participant observation, interviews, surveys and system logging. This field study, following the MILC approach, used system logging, participant logging and questionnaires to capture various types of data. IV techniques and tools are focussed on data exploration, thus user performance in terms of efficiency was excluded from the field study. The following metrics and associated data capture methods were used for the field study:
a) Effectiveness – Log Book Diaries and System Logging b) User Satisfaction – Questionnaires
c) Qualitative comments – Log Book Diaries & Questionnaires
System logging was identified as one of the key methods for capturing data using the MILC approach (Shneiderman and Plaisant, 2006). System logging is useful to determine user behaviour and exploration (Pohl, Wiltner and Miksch, 2010). Logging was used to identify errors within the interaction and whether participants were able to complete their tasks.
Participant observation may be considered an intrusive evaluation method especially when participants are managing PI (Brehmer et al., 2014). Thus, alternative methods, such as log books and system logging, can be used to address this issue. One of the guidelines of the MILC approach is to provide a log book to participants for identifying problems and insights as well as general comments. After completing each task, a participant was required to complete a log book diary
2
9 2
Number of Devices used to Manage PI (n=13) 2 Devices 3 Devices 4 Devices 8 4 1
Main Device for Managing PI (n=13)
PC Laptop Phone
in the form of an electronic questionnaire, which was the same for each day of the field study as well as for each device evaluation. The log book diary used the After-Scenario Questionnaire (ASQ) (Lewis, 1995), with the addition of the participant number for administration and capturing purposes as well as an open-ended section for general comments.
Forsell and Cooper (2012) proposed the idea of creating standard questionnaires for IV and noted that more focus needs to be placed on subjective measures for evaluating IV techniques and tools. At the time of the commencement of the field study, no questionnaires could be found specifically for IV evaluation. Thus, user satisfaction and qualitative comments were captured using a similar post-test questionnaire to the preliminary user study (Section 6.2.3). A post-test questionnaire was provided for both the desktop and tablet devices. The post-test questionnaire used the NASA- TLX form (Hart and Staveland, 1988) to measure cognitive load and the Computer Satisfaction Usability Questionnaire (CSUQ) (Lewis, 1995) to capture overall satisfaction, usability and general comments. The same questions were used for each IV technique incorporated in the MyPSI prototype as in the preliminary user study (Section 6.2.3). These questionnaires used 5- point Likert scales for simplification purposes in order to make it easier and simpler for participants to complete the questionnaires. An additional question was added to the post-test questionnaires for the field study, i.e. Would you consider using this system in future?
All questions included in the electronic log book diaries and the post-test questionnaires used a 5-point Likert scale. Descriptive statistics, including the mean and median, were calculated for each question. Each value of the 5-point Liker scale had an associated meaning, i.e. 1: Strongly Disagree, 2: Disagree, 3: Neutral, 4: Agree and 5: Strongly Agree. Thus, a mean rating with a value of greater than 3.4 indicated that the respective result was strongly positive, i.e. equivalent to either a four or five rating in the Likert scale.
8.2.5. Tasks
Several main functions were identified in Section 5.2.3. Similar tasks were identified for the field study as the preliminary user study derived from Section 5.2.3, with the addition of linking functionality. These high-level tasks were mapped directly from the required functionality and included the following:
Data manipulation; Semantic Zooming; Sorting;
Intelligent Browsing; Intelligent Searching; Filtering;
Tagging; Linking.
It needed to be determined whether a predefined set of tasks should be provided in the field study versus allowing users to explore. According to Stone et al. (2005), there exists various levels to controlling the participants’ tasks. These levels of control include ensuring each task is predefined by the facilitator, participants can comment on suggested tasks and participants can add additional tasks, participants are offered a choice between predefined tasks and their own tasks, and participants are required to suggest their own tasks. Allowing participants to create their own tasks restricts the evaluation in not being able to compare results from different participants and thus ventures into an explorative domain. Providing a predefined set of tasks to the participants may allow increased control and ensure that participants evaluate each aspect, but there is little room to explore with the system and thus it is too restrictive. Offering participants a choice between task lists also makes it difficult to compare results between participants. Thus, the alternative and most appropriate level of task control was to provide participants with a task list and encourage participants to further explore the prototype, as the functions of the MyPSI prototype are well-defined. Additionally, this control level is the most balanced, as it ensures that each aspect of the prototype is evaluated and comparison is possible between participants. Similar tasks were identified for the desktop and tablet versions of the MyPSI prototype. The tasks were kept direct, but as vague as possible, as the structure of each participant’s data was unknown and so the tasks were described such that each participant could complete the tasks with their own data. The main tasks were included at least twice within an evaluation on a device, to ensure that the participant was not influenced by unfamiliarity. Each main task included sub- tasks, which required participants to use the appropriate IV techniques to complete the tasks.
Table 8-2 Main Task Groupings per Day for both the Desktop and Tablet Devices
Main Functions Day 1 Data Manipulation
Semantic Zooming Day 2 Browsing Sorting Day 3 Searching Filtering Day 4 Tagging Linking
The grouping of the main tasks is shown in Table 8-2. The detailed task list for each day of the field study is provided in Appendix H.
8.2.6. Equipment
All participants had access to a desktop or laptop computer with which to complete the tasks relating to the desktop version of the MyPSI prototype in the first week of the field study. Two participants used their own tablet devices to complete the tasks in the second week of the field study, one used an older Android tablet device and the second participant used a Samsung Galaxy Tab 4. The remaining participants used loaned tablet devices. Two of these devices were Samsung Galaxy Tab 4 tablets and the rest were older tablets, specifically Samsung Galaxy Tab 2 devices.
8.2.7. Procedure
An email, detailing the procedure of the field study, was sent out to potential participants requesting their participation in the study. The field study consisted of two subsequent two-week periods (Monday – Thursday). As soon as participants agreed to participate in the field study, they were sent their specific participant information, including an assigned participant number to be used throughout the field study for reference and to ensure confidentiality and anonymity within the results. The participants were required to consent to participate in the study by completing an electronic consent form (Appendix G). The participants were also requested to complete a biographical questionnaire similar to the preliminary user study (Appendix D). Each participant was forwarded an instruction manual on how to sign in and upload his/her PI to Dropbox prior to commencing the field study tasks. A requirement of the MILC evaluation approach was to provide training on the MyPSI prototype to each participant (Shneiderman and Plaisant, 2006). To maintain as unobtrusive a field study as possible, video tutorials were created
relating to each day of the field study. The first day of the field study additionally included video tutorials relating to the sign in process and describing the UI of the MyPSI prototype. Each day of the field study required participants to view the relevant video tutorials, and then commence the tasks for the day. After the day’s tasks were completed, the respective log book was completed. At the end of each week, which represented an evaluation of either the desktop or tablet devices, the link to the electronic post-test questionnaire (Appendixes J and K) for that week was provided to the participants. The desktop version was evaluated first as it was identified as the main device used to manage PI for most participants (Figure 8-2b). Table 8-3 depicts each week and the version of the prototype evaluated in the field study. After the second week of the field study, each participant was thanked for their participation and any loaned tablet devices were returned.
Table 8-3 Field Study Weeks with the Relevant MyPSI Version Evaluated
MyPSI Version
Week 1 Evaluate the web page using https://www.mypsi.co.za on a desktop or laptop device with mouse-based interaction
Week 2 Evaluate the PhoneGap application on an Android tablet device with touch-based interaction.
8.3. Evaluation Results
The field study results are discussed in terms of effectiveness and satisfaction. The qualitative results captured through the log books as well as the post-test questionnaires conclude the Results section.
8.3.1. Effectiveness Results
The system logging for both the desktop and tablet versions of the MyPSI prototype did not reveal any interesting results. Unfortunately, for some participants, the logging was also not captured correctly, as there currently exists no Append function to add new information to a particular file, thus the method to append to the log file read the file to capture its existing contents, added the new information to the contents and then wrote this content back to the file, overwriting the existing content. Some of the time, the reading function was unreliable and could not retrieve existing content and thus the new information overwrote the file’s contents. For those participants whose files were captured correctly, it could be seen that participants only completed the required tasks and did not explore the system further. Additionally, the participants followed similar
processes to complete the tasks as the participants closely followed the steps in the video tutorials provided for training.
Fortunately, the log book diaries provided detailed information for each set of tasks completed on each day of the field study. All participants fully completed the log book diaries for each day of the field study. The groupings of the high-level tasks are repeated in Table 8-4 for ease of reference within this section.
Table 8-4 Main Task Groupings per Day for both the Desktop and Tablet Devices
Main Functions
Day 1 Data Manipulation & Semantic Zooming
Day 2 Browsing & Sorting
Day 3 Searching & Filtering
Day 4 Tagging & Linking
The first question in the log book diaries for both versions of the MyPSI prototype related to the ease of completing the tasks for that day. The mean ratings for the each day of the field study for both the desktop (Week 1) and tablet (Week 2) versions of the MyPSI prototype for the first question of the log book diaries are displayed in Figure 8-3.
The set of tasks for each day of the field study received positive ratings in terms of ease of use (Figure 8-3). From Figures 8-3(a) and (b), it can be seen that the desktop version of the MyPSI prototype was easier to use for data manipulation and zooming, searching and filtering, and tagging and linking. Participants found browsing and sorting slightly easier using the tablet version of the MyPSI prototype (Figure 8-3b).
(a)
(b)
Figure 8-3 Mean Ratings for the Log Book Diaries for Q1 (n=13)
Figure 8-4 displays the mean ratings for the log book diaries for the second question, which related to the time taken to complete the sets of tasks for each day of the field study. Participants were