5.5 Results and Analysis
5.5.1 Demographics
Fifteen teams of two participants each participated in the experiment. For the analysis of user satisfaction metrics such as effectiveness and efficiency, the users’ responses were taken individually (n=30). For performance metrics such as time on task and collaboration, data for the team was used (n=15). The results presented in this section are therefore discussed either in terms of participant results or team results.
5.5.1.1 Participant demographics
Participants were mainly recruited from the Wi-Fi area in the Computing Sciences Department, which is made available for students from all departments. Several postgraduate students from the Computing Sciences Department were also asked to participate. Team members were selected who were sitting in pairs, implying that they knew each other or were used to working together.
Figure 5.1: Participant Demographics (n=30)
Figure 5.1 shows that a good, representative spread of participants was achieved. There were slightly more females than males. Thirteen per cent were left handed, which would appear to approximate the 10% of the population believed to be left handed (Hardyck and Petrinovich, 1977). The participants were screened for colour blindness; none of the participants were colour blind. Half of the participants held or were reading towards postgraduate degrees, while the other half were studying towards their first degree or held
a school-leavers certificate. Half of the participants considered themselves expert users, the rest considered themselves intermediate with one exception. One participant reported as a “novice” user but also reported that they had 3-5 years of computing experience. It was felt that this user had enough experience to be grouped with the intermediate users. One area of concern with the demographic spread lay in the age groups. All participants were aged between 18 and 30 and a future investigation covering a wider range of age groups may be justified.
5.5.1.2 Team Demographics
Figure 5.2 shows some selected characteristics of the team compositions. The teams were fairly homogenous. This is not unexpected since friends and colleagues are likely to be similar in many respects and the teams petitioned for participation were pairs of students usually involved in some sort of collaborative work or study. There were only 15 teams with several factors contributing to team composition. It is therefore not possible to infer much about the general composition of collaborative groups or find any statistically significant results regarding any correlation between team composition and collaboration performance. It is this researcher’s opinion that deeper investigation into the effects of social dynamics on collaborative use of a multi-touch table would be very useful.
Only 23% of the teams had one female and one male participant; 73% of the groups were of the same gender. Only two of the teams regarded themselves as being less than well acquainted. Eighty per cent of the teams were composed of participants with a similar education level. In addition, the remaining 20% were friends rather than colleagues, a possible reason for the difference. Computing expertise followed a very similar pattern to the education composition. Forty-three per cent of the teams comprised two users who self-reported as experts, 36% both reported as intermediate users. Only 21% of the teams were heterogeneous with regard to computer expertise.
Figure 5.2: Team Demographics (n=15) 5.5.2 Performance Results
Performance results were calculated per-team since team members were collaborating to achieve the same goal (n=15). Tasks were considered to be completed when both team members moved on to the next task, whether the goal of the task had been achieved correctly or not. The task was then marked as being either completed successfully or failed (1 or 0). Figure 5.3 shows the proportion of teams who successfully completed 12, 13 and 14 tasks. The majority of the teams (60%) completed all 14 tasks without assistance. Four of the teams failed one task (27%) and two of the teams failed two tasks (13%).
It was not apparent that any of the tasks were failed more often; half the tasks had at least one failure. Each of the eight failures occurred on a different task, except Task 6 which had two failures. Task 6 required users to update the control with annotations. This possible greater complexity of this task is discussed further with regard to time on task.
Figure 5.3: Proportion of teams successfully completing x/14 tasks (n=15)
Figure 5.4: Time on task averaged across groups (n=15)
The time on task performance metric also shows that the functional requirements are well- supported, with only one task taking longer than 80 seconds on average to complete.
12/14 tasks 13% 13/14 tasks 27% 14/14 tasks 60%
Proportion of teams successfully completing x/14 tasks 0 20 40 60 80 100 120 140 160 180 200 1 2 3 4 5 6 7 8 9 10 11 12 13 14 T im e O n T a sk ( s) Task Number
Superimposed on the scatterplot are columns representing the mean time on task for each task, along with error bars showing one standard deviation. It can be seen from the scatterplot that the distribution of the data points is skewed towards lower values. From the graph it can be seen that the tasks with a mean time on task of less than 40 seconds also have a relatively small spread of data points and a more even distribution. Most teams took slightly below average for the tasks, with a few teams taking well above average for some tasks. This had the effect of pushing the mean up. Very few tasks had data points observed below one standard deviation but in all tasks there were data points observed above one standard deviation. This was to be expected, as even very adept teams require some time to perform the task, while there was no defined upper time limit for teams that were struggling with a task. Figure 5.5 reinforces this with a line graph showing the cumulative task time of the best performing team, worst performing team and the average. The graph demonstrates that the best team is not much better than average, although the worst team is considerably worse.
Figure 5.5: Cumulative task times for best team, worst team and the average (n=15)
Fastest Team: 467s Mean: 562s Slowest Team: 892s 0 100 200 300 400 500 600 700 800 900 1000 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Cum ula tiv e T a sk T im e (s ) Task Number
The task times were very good, especially considering that 86% of the participants had never used a large multi-touch surface before. Nine of the 14 tasks (62.4%) took less than 45 seconds to complete. From Table 5.3 it can be seen that the fastest tasks involved folder operations – creating and manipulating folders and adding and removing items. This indicates that the drag-and-drop interaction technique for folder operations was particularly efficient.
Table 5.3: Summary of tasks and tested functional requirements
Task summary
Mean Time on Task (s)
Functional Requirement
1. Create a menu widget and choose a colour for each
user. 28.1 10, 11
2. Create 2 folders 42.1 6.
3. Resize these folders, move to central location. 29.0 6, 12
4. Search for “SAICSIT 2011”. 78.6 1, 2, 3.
5. Browse the results and open webpages 42.4 4, 5
6. Annotate results. 111.5 7
7. Place the webpages in folder. 33.6 6
8. Search for “NMMU logo” and open two different
versions of the logo. 68.6 1, 2, 3, 4, 5
9. Rate images 73.3 7
10. Add the images to folder. 34.9 6
11. Search for “Cape Town” to find out more information. Find at least 2 places of interest, photos or articles about the town that interest you both.
80.1 1, 2, 3, 4, 5, 9, 10
12. Add your chosen places of interest to the Travel
folder. 28.4 6
13. Delete the lower rated image from the Images
14. Export the data and the annotations 41.6 11,12 The slowest tasks to complete were tasks 4, 6, 8, 9 and 11, which all took more than 60 seconds to complete on average. It can be seen from Figure 5.4 that Tasks 6 and 9 also had a large spread of times and a resulting very large standard deviation. As shown in Table 5.3, Tasks 6 and 9 tested functional requirement 7 which stated:
Allow users to update or add value to shared information.
Task 6 asked users to annotate the controls with a note about the information that could be found inside them. Task 9, performed slightly quicker, asked users to rate the controls. This task is a more time-consuming one than most others but the high variation in task time shows that some users clearly found the task much easier than others. One likely explanation is the difficulty that many users found with the on-screen keyboard, which is discussed in Section 5.5.5. However, at least one participant found the annotations useful, stating that he found it the most positive aspect of the system.
Tasks 4, 8 and 11 were tasks involving collaborative search. The three tasks had increasing levels of complexity, as shown by the number of functional requirements they were testing, given in Table 5.3. Task 4 required them to search the information space and have the results presented to them. Task 8 asked them to search, browse through the list and open the result. Task 11 was a complete, open-ended task asking them to communicate between them to find a result in which they were both interested. The task times were about the same but the standard deviation was less on Task 11, which was probably a result of more experience in using the search function.
It should be noted that the slowest five tasks all required data input using the on-screen keyboard. This highlights a potential weakness in using this method of data input. The on- screen keyboard also received some negative qualitative comments, which are discussed in Section 5.5.5.
5.5.3 Satisfaction Results
User satisfaction was evaluated using a post-test questionnaire (Appendix F). The questionnaire had 30 questions with responses given on 7-point Likert scales. These questions were divided into four sections, with a fifth section providing qualitative feedback.
Section A solicited responses about cognitive load. Five of the six questions were more simply phrased with a lower Likert score implying a better result, for example:
How physically demanding were the tasks? Very Low – Very High
All the questions in Section A had their scores standardised by inversion after collecting the data, with the exception of Question A.4 which asked users to comment on their performance. The questions in sections B, C and D were all phrased so that high Likert ratings meant high usability. The ratings are thus all standardised such that a higher Likert rating implies a better result, over the whole questionnaire.
Cronbach’s alpha coefficients for reliability were calculated for each questionnaire section (Table 5.4). Cronbach’s alpha coefficients greater than or equal to 0.70 are sufficient evidence of adequate reliability (Nunally, 1978). The observed Cronbach’s alpha coefficients were all in this interval which confirms the reliability of the scores for each section. In addition, Sections B, C and D were considered together, as they were all related to usability as opposed to cognitive load. These sections taken as a whole also had an acceptable level of reliability.
Table 5.4: Cronbach's alpha coefficients for reliability for each section (n=30)
Questionnaire Section Cronbach's alpha
A: Cognitive Load 0.70
B: Overall Satisfaction 0.77
C: Usability 0.92
D: Collaboration 0.89
B + C + D 0.86
Figure 5.6 shows the results of the user satisfaction questionnaire per section, together with a summary of sections B, C and D. The mean, median, mode and standard deviation of the 7-point Likert ratings are listed for each section. The means for each section were greater than 5 and the median rating was 6 in all sections. The most frequent rating given was 7, which was the highest possible rating. Overall this constitutes a very positive result from the user testing.
Figure 5.6: User Satisfaction Results Summary (n=30)
From Figure 5.6 it can be observed that the system was most highly rated in Section D with a mean of 6.17 and the lowest observed standard deviation of 0.67. This section asked users to report on Co-IMBRA’s support for collaboration. The mode for this section was 7.00. This is a very encouraging result as collaboration was the focus of the system. Section B was the second most highly rated section, with a mean of 6.00. This section also had a mode of 7.00. Users clearly found the system highly satisfying to use. Section C, the section on usability, was also highly rated with a mean of 5.90, although the mode in this case was 6.00. This is still a very high rating although one or two minor usability issues were found.
The graph shows that the system was rated lowest on cognitive load, with a mean of 5.45. This is consistent with what was observed as well as the qualitative user feedback. Some users found the system to be rather physically demanding and exhibited signs of fatigue during the test. This section also had the highest standard deviation (0.88) which indicated that there was greater difference of opinion among users with regards to cognitive load. Figure 5.7 shows the means of each of the questions about cognitive load with error bars of one standard deviation.
0.68 0.67 0.85 0.79 0.88 7.00 7.00 6.00 7.00 6.00 6.00 6.00 6.00 6.00 6.00 6.02 6.17 5.90 6.00 5.45 0.00 1.00 2.00 3.00 4.00 5.00 6.00 7.00 B + C + D D: Collaboration C: Usability B: Satisfaction A: Cogitive Load
7-Point Likert Scale Rating
Q ues tio nn a ire Sect io n
User Satisfaction Results
Mean Median Mode Std Dev
Figure 5.7: Mean 7-point Likert scale ratings for Section A: Cognitive Load (n=30)
Participants gave Co-IMBRA the lowest mean rating (4.83) with regard to cognitive load on the question:
“Effort: How hard did you have to work to accomplish your level of performance?”
Other questions that received a relatively poor rating were questions asking for the physical and mental load the system placed on the user. This is likely to be due to the physical effort required from using the large screen and gesture interaction. It should also be noted that these three questions have comparatively large standard deviations. This implies that some users found the system to place a large cognitive load on the user whilst others did not.
5.27 5.37 5.70 5.70 4.83 5.83 0.00 1.00 2.00 3.00 4.00 5.00 6.00 7.00 Mental Load Physical Load Temporal Load Performance Effort Frustration
Mean Likert Scale Rating
Figure 5.8: Mean Likert scale ratings for Section B: Overall Satisfaction (n=30)
Figure 5.8 shows the mean Likert ratings for the four questions of Section B, which rated the system on how satisfying it was to use. Error bars showing one standard deviation are also displayed. The graph shows that the system was rated particularly highly on learnability (mean = 6.27). This was corroborated by qualitative feedback, in which 40% of users commented that the system was “intuitive” or “easy to learn”. The other three questions also had very high means and low standard deviations. The highest standard deviation occurred with ease of use, implying that some users were less satisfied than others that the system was easy to use.
Section C of the questionnaire asked participants for ratings on usability, which gave more insight into the ease of use of the system. As can be seen in Figure 5.9, mean Likert ratings for usability were all above 5.50, with the lowest scores achieved for speed – a mean of 5.66 for the speed of the system and a mean of 5.73 for the speed of browsing. This result was confirmed by the 20% of participants who also remarked that the system was not as fast as they expected. Any amount of lag was not well tolerated by the participants, who expected the virtual objects to behave exactly as tangible objects would. This is discussed further under Section 5.5.5.
The next-lowest mean rating was for the question:
“The system had all the functionality I would expect from a collaborative information system.” 5.93 5.83 6.27 5.97 0.00 1.00 2.00 3.00 4.00 5.00 6.00 7.00 Overall Ease of Use Learnability Simplicity
Mean Likert Scale Rating
From the error bars of one standard deviation displayed in Figure 5.9, it is evident that this question also had one of the largest standard deviations. The users clearly had differing expectations of the functionality of the system. However, the mean result of 5.83 showed overall satisfaction. The only specific request for missing functionality was a checklist accessible to both users, to allow them to keep track of the task list they were following. Section D asked for Likert ratings for the collaborative functionality. These questions directly corresponded to the twelve functional requirements given in Section 2.6.5.
Figure 5.9: Mean Likert scale ratings for Section C: Usability (n=30)
Figure 5.10 shows the mean Likert scale ratings for the section of the questionnaire dealing with collaborative functionality together with error bars representing one standard deviation. Two thirds of the questions had Likert ratings above 6.00. The highest rated functions were adding annotations and opening controls. One user commented that the annotations were the most positive feature of Co-IMBRA. Also highly rated were functions to assist with collaboration, such as providing shared access, division of workload and enabling effective communication.
6.17 5.66 5.97 6.03 5.83 5.93 5.73 5.87 0.00 1.00 2.00 3.00 4.00 5.00 6.00 7.00 Effective Quick Efficient Productive Functionality Internet Browsing Quick Browsing Efficient Browsing
Mean Likert Scale Rating
Figure 5.10: Mean Likert scale ratings for Section D: Collaboration (n=30) The lowest rated three questions were:
“The system effectively visualised the search results and shared information.” “We could effectively browse the visualised information.”
“We could effectively manipulate the controls.”
Although the ratings were still high, these three questions are clearly related and may represent a minor situation of concern with the system. Users seemed to rate the system lowest on the way it visualised the search results and the way they browsed those results. The system displayed previews of the results in a list which could be browsed with gestures (a flick up and down) or by using a scroll bar, a traditional Windows metaphor. Several users preferred to use the scroll bar instead of the gesture to browse the results,
6.17 5.97 5.86 5.93 6.53 5.90 6.53 6.07 6.38 6.33 6.33 6.21 0.00 1.00 2.00 3.00 4.00 5.00 6.00 7.00 Provides shared access
Search effective Visualisation effective Browse visualisations Opening objects Effectively manipulate Add annotations Aware of teammate actions Communicated effectively Assists to divide workload System logged actions Locate previously found info
Mean Likert Scale Rating
perhaps because it was more familiar to them. The scroll bar is more difficult to control with touch than with a mouse and this could have been the main problem.
5.5.4 Collaboration Results
Figure 5.11: Collaboration styles exhibited by the teams (n=15)
Collaboration style was classified into the three types as suggested by Morris and Winograd (2004). With a sample size of 15 it is not possible to make accurate inferences about correlations between demographic groups and collaboration style, however some trends were observed. Figure 5.11 shows the proportion of teams that exhibited each of the three collaboration styles. Serial collaboration was the most common, with users choosing to work together on the same task. This collaboration style was found amongst all team compositions. Parallel collaboration occurred in 20% of the groups. Interestingly, all three teams that exhibited parallel collaboration consisted of teammates with mixed education level, mixed computing abilities or both. This could indicate that the group was making the best use of their diversity of skills. Assembly-line collaboration also occurred in 20% of the groups. These groups chose to work on different aspects of the same task, such as one performing searches and then passing all the search results to the other team member. The first team member was then free to perform the next search while his or her teammate browsed the results. Again, an interesting trend was observed: the three teams that exhibited this kind of collaboration consisted entirely of expert computer users. Two of the three assembly-line groups had teammates who were both postgraduates, the other
Serial 60% Parallel 20% Assembly Line 20%
Collaboration Styles
team’s members were both undergraduates. This type of collaboration could be chosen to improve productivity when both users are of a similar, high skill level.
It would appear that serial collaboration is most intuitive when a team is faced with a problem that needs to be solved together. Users tended to begin by approaching the