Category 4: System Response (Output)
2.9 Summary
We presented a literature review of gestures as an interaction technique, organised using the concepts that we identified and coded using content analysis techniques. Continual analysis of the categories through our accumulation of data from the literature informed our classification of gesture interaction systems into four categories that represent com- ponents if gesture interactions: gesture style, application domain, enabling technology, and system response. Over 40 years of computer research in gestures suggest they are may provide a natural, novel and improved interaction technique, and that while per- ceptual input is less reliable than direct input devices, gestures maintain their presence in the research. Unlike speech interactions, which are included as standard features of Microsoft Windows or Mac operating systems, we ask why not gestures? This was a problem discussed byBuxton et al.(1983) who noted that for gesture interactions, there exists
a perceived discrepancy between the apparent power of the approach and its extremely low utilisation in current practice.
This is also relevant today, where so much is done in theory, yet so little is ever applied. In the next section, we present a study which begins to uncover some of the defining characteristics of the categories presented in this chapter.
”Our Age of Anxiety is, in great part, the result of trying to do today’s jobs with yesterday’s tools.”
Marshall McLuhan
3.1
Introduction
An initial motivation for our research was the iGesture system, an extensible platform for conducting gesture research, developed at the University of SouthamptonHare et al.
(2005). For several months, we studied the iGesture system, measuring its performance, gesture recognition capacity, and functionality. Results were generally positive, however our knowledge was limited to the one system. If we were to consider making changes to improve that system, we would first have to understand what we wanted to improve and how we could improve it. To do this, we stepped away from the technology, and considered the human perspective to learn if gestures could indeed enhance interactions. Our review of the literature revealed semaphoric gestures were common in research, yet rarely seen outside of the laboratory setting. In addition, several researchers had suggested that hand-based signs, or semaphoric gestures are not natural, and are an in- frequent form of human gesture (Wexelblat,1998;Quek et al.,2002;Kettebekov,2004).
Wexelblat(1998) in particular challenged the use of gestures as a viable interaction con- troller since they could be physically taxing and demonstrated at best 90% recognition accuracy. While we would not normally expect people to settle for 90% accuracy in recognition, we challenged this claim. We undertook a study to investigate gestures and their functionality using the following approach:
1. We investigated a functional utility - a value to the interaction - for semaphoric gestures
2. We considered different contexts and compared the affects of using gestures for secondary tasks.
We approached this phase of the research using the Wizard of Oz (WoZ) methodol- ogy, a term coined by J.F.Kelley (2002)1 to describe an approach where the intelligent behaviour of a computer was controlled by a person who remained out of view of the participants. The term was inspired by the film, the Wizard of Oz, and is an effi- cient method for investigating interactions without having to deal with many inhibiting system constraints. We chose this method to avoid the constraints that iGesture could potentially impose on the experiment due to sensitivity to lighting changes and tracking. Our experience with iGesture and the user studies we conducted confirmed our initial thoughts that gestures would be useful for controlling music applications. This informed our approach to this study, with results suggesting that gestures offer significant ben- efits over function keys for secondary task interactions in multitasking situations. We described secondary tasks as being on-critical and placing little cognitive demand on users. This result supported the goals of notification system interactions, where multi- tasking is a key feature of the interaction and there is a goal to reduce distraction to users. This result also inspired a collaborative effort with the Virginia Tech to investigate gestures for notification system interactions, discussed in Chapter6. The remainder of this chapter describes studies conducted on iGesture, and the experiment to determine the functionality of semaphoric gestures. A short version of the experiment appeared in the 2005 conference on computer human interactions (CHI) (Karam & schraefel,2005a).
3.2
iGesture
We present a report of several studies to investigate the iGesture platform, which sup- ported some of the initial research conducted towards this dissertation. iGesture played an integral part in guiding our qualitative and formative studies, and is currently being redesigned for future research and development. We next present details of the iGesture system, and the experiments we conducted to determine its performance in terms of accuracy rates, its capacity to recognise gestures, and the different types of gestures it could recognise. We also discuss some lessons learnt from our experiences working with gestures and iGesture.
1
Figure 3.1: Multiple users controlling a music application using iGesture and a
coloured marker to assist with tracking.
3.2.1 iGesture Platform
The iGesture platform is a tool for implementing multimodal gesture-based interactions in multimedia contexts. It is a low-cost, extensible system that uses visual recognition of movements to support gesture input. Computer vision techniques support interactions that are lightweight, with minimal constraints. The system recognises gestures executed at a distance from the camera, for multimodal interaction in a naturalistic, transparent manner for many different application domains including desktop, ubiquitous and CSCW computing environments (see Figures 3.1 and 3.2). In addition, iGesture can process raw visual input to control midi devices. For our research, we focused on semaphoric gestures for controlling software application tasks. iGesture is scriptable and can map a gesture onto any command line function or statement. While this work was exploratory, the experience with iGesture contributed to understanding of many characteristics of the gesture interactions that we investigated in this research. We investigated several gesture recognition systems in the literature, (Westeyn et al.,2003;Dannenberg & Amon,1989;
Henry et al.,1990), but none provided the flexibility ease of use of iGesture. iGesture was easily configurable, gestures were easily trained and mapped onto tasks so that we discovered many gestures that it could recognise. We discuss our observations and experience from using iGesture next.
Application controls. iGesture runs on the Mac OSx operating system, and can most software applications. For example, in iTunes, an Applescript can be written to control most tasks —play, pause, stop and volume controls, as well as more complicated tasks such as managing play lists. We interacted with other applications including Winamp, Quicktime, Microsoft Office, and a jukebox software application written by Max Wilson. A screen shot of the iGesture system is presented in Figure 3.3.
Figure 3.2: Training the iGesture system on a large screen display.
different gestures presented to participants during the training sessions to provide visual cues for each gesture.
Gesture sets. iGesture the system is designed to track directional motion and could recognise combinations of horizontal, vertical, and circular motions. Figure 3.4
shows diagram of one set of gestures we tested. Gestures can be programmed using two channels, so that there are identical gestures that the system can recognised for both left and right hand interactions. We use two different coloured objects to track movements. Hand recognition was possible, however it was more effective to use a bright colour since there was less chance of similar colours being picked up accidentally in the background. We discuss recognition next.
3.2.2 System Performance
While iGesture recognises a large set of gestures, several issues lead to poor recognition. First, variations in the lighting requires the hue and saturation levels to be altered to reflect the changes in light. In addition, gestures with similar trajectories can lead to incorrect recognition, while recognition performance decreases with each gesture trained in the system. A discussion on the implementation details of the iGesture system is provided in a technical report by Hare et al. (2005), at the project web site Karam & Hare (2004) and in Appendix B of this dissertation, along with links to demonstration videos. We next discuss the studies conducted to determine iGesture accuracy rates.
3.2.3 Measuring Performance Accuracy: Experiment
To check the accuracy of the semaphores gesture recognition subsystem, we performed an evaluation in controlled conditions. The system was set up in a room with fixed lighting and the camera was positioned to cover as much area as possible. The fixed lighting conditions mimic the office environment in which the system is currently de- ployed. The evaluation was designed to assess the accuracy rate of the system in terms of percentage of correctly recognised gestures, the percentage of false positives (gestures incorrectly recognised) and percentage of false negatives (gestures not recognised). A gender balanced group of 8 volunteers was assembled for the evaluation in order to assess the performance over a group of potential users.
Table 3.1: Averaged results from each part of the evaluation
Average Correct Average Incorrect Average Missed
Pre-trained, Centred 93% 4% 3%
Pre-trained, Non-Centred 84% 5% 12%
Subject-trained, Centred 91% 3% 7%
Subject-trained, Non-Centred 87% 2% 11%
Method. The evaluation was performed in two parts. First, the system was loaded with a set of pre-trained gestures, illustrated in Figure3.4, however for this experiment, we only used single handed gestures. Second, the participants were asked to train the system to recognise their gestures before the evaluation commenced. These two parts allowed us to evaluate the effect of user-trained versus pre-trained gestures on the recognition accuracy. Both parts of the evaluation consisted of two subparts. First, the subjects were asked to perform each of the gestures 5 times in a stationary position directly in the centre of the camera’s field of view. In the second sub-part, five different points in the room were pre-selected to cover the full visual field of the camera. The participants were then asked to perform the gestures at each point, while facing the camera. The results of the evaluation showed no statistically significant intra and inter- subject variability, so the results have been averaged and are shown in Table3.1.
Results. Extensive use of the iGesture system provided us with the following details about which semaphoric gestures were best recognised by the system. Since iGesture tracks movements in 4 directions, we had to work around conflicts between gestures that follow a similar initial trajectory. We noted that right and clockwise, left and counter- clockwise, up and stop gestures were the most confusing to the system. Users also recognised this conflict becoming frustrated with them and requesting that the gestures be changed to avoid these problems. We noted that the primary researcher became proficient at all of the gestures after extended periods of use and could avoid these conflicts using various strategies. This was not possible for novice users, who did not have enough experience to create strategies for improving recognition. Because of the amount of time required for users to become proficient at performing gestures, and due to the changes in lighting throughout the day that would effect tracking, we continue to use the (WoZ) methodology to enable us to continue with our focus on the interactions and not on the system.
3.2.4 iGesture and Manipulative, Free-form Gestures
An additional feature of the interaction enabled through the iGesture system included a direct transfer of visual images to midi input to create a simulation of an air guitar interaction. Figure 3.5 shows a participant using the air guitar feature of the iGesture
Figure 3.5:A participant enjoying the air guitar interaction during a free-form gesture
investigation.
platform. With this application, we were able to design a creative interface that was mostly used to demonstrate the more playful side of gesture interactions. While this interaction provided a great deal of enjoyment for the researchers and visitors to the University, this interaction is included to stress the flexibility interactions possible using iGesture.
3.2.5 Experience Report and Research Motivation
While extensive use of the iGesture system leads to improved performance on gestures, the system is still in its prototype phase, and is currently being redesigned for multiple platform use. We will also implement different recognition algorithms and techniques such as tracking shapes in addition to colour for future testing. We continue to use the system, discovering novel ways of performing gestures, and working around the performance issues. Other gesture recognition systems and techniques can enable more robust recognition than iGesture, and we could redesign the recognition process however before embarking on this task, we wanted to first ensure there would be a functional utility for these interactions. We decided to learn more about the human perspective of gesture interactions, and the scenarios in which gestures could be of benefit to the user. In the next section, we present research that attempts to determine if there is indeed a functional utility for semaphoric gestures, and in what contexts.