Part I – Background
2 Theories and frameworks on tangible and gesture interaction
2.4 Tangible Gesture Interaction
2.4.1 Abstracting frameworks
In 1997, Robertson [176] proposed a specific characterization of embodied actions for Computer Supported Cooperative Work (CSCW). The proposed taxonomy, generated through the observation of designers collaborating to the conception of a computer game, individuates three categories of embodied actions: actions performed in relation to physical objects, to other bodies, or to the physical workspace. Physical objects are highlighted as an important resource for the users and moving these objects is a typical action that can be carried out by the users. Robertson highlighted that other typical actions with objects are constituted by gestures that are performed to communicate with other users, in particular, to emphasize particular properties of the object.
In her PhD thesis, Karam [108] described tangible gestures as a particular class of manipulative gestures, i.e., gestures performed with physical objects to manipulate physical or digital objects.
Among the examples, she reported systems where the users can manipulate a digital object through a physical artifact (e.g., Hinckley et al.’s manipulation of doll’s head to visualize brain sections [88]) and systems where the manipulation of a physical object generates, through computer interpretation, the movement of another object (for example a robotic arm). She also considered as tangible gestures free movements of the body performed to control an environmental output, such as in artistic performances, where computer interprets the dancer movements (gestures) to augment their performance with a digital output (e.g., music or light projection).
The first formalization of gesture interaction with objects can be dated to 2008, when Vaucelle and Ishii defined Gesture Object Interfaces [217]. The authors used a semiotic square to oppose Gesture Object Interfaces to the classical GUIs. Indeed, in Gesture Object Interfaces, the user performs gestures while manipulating physical objects, in respect to GUIs, where neither gestures nor object manipulations are exploited to interact with the information. Gesture Object Interfaces differ also from Gesture Recognition, in which the hand of the user control directly the digital information, and from Tangible User Interfaces, in which the information is directly controlled through the manipulation of
physical objects. Gesture Object Interfaces, instead, are intended to animate objects with meaningful gestures, introducing what they called “an identity reinvention”. For Vaucelle and Ishii, the powerful metaphors associated to gestures can animate or give a new identity to objects. At the same time, interacting with physical objects facilitates spatial cognition, which is absent in free-hand gesture interaction.
Figure 16. Gesture Object Interface semiotic square by Vaucelle and Ishii [217].
In 2011, Hoven and Mazalek [213] presented the first definition and characterization of Tangible Gesture Interaction (TGI): “the use of physical devices for facilitating, supporting, enhancing, or tracking gestures people make for digital interaction purposes. In addition, these devices meet the tangible interaction criteria”. In their paper, they presented also a review of gesture interaction, distinguishing between (free-hand) gestures in 3D space, gesture on surfaces, and gestures with physical objects, specifying that all systems in the latter category can be considered as examples of tangible gesture interactive systems. In this category, they found examples of gestures with mobile devices, gestures with batons and wands, gesture with game controllers and remotes, gestures with dolls, toys and props and, finally, gestures with custom tangibles. For this latter category, Hoven and Mazalek noticed that researchers often build their own objects with which the user can interact through gestures, or they build small sensors that can be attached to existing physical objects to transform them in interactive objects.
Besides Vaucelle and Ishii’s [217] and Hoven and Mazalek’s [213] ground works on tangible gesture interaction, some frameworks have been proposed on specific gestures with objects. Wolf et al. [231] proposed a taxonomy of microgestures, gestures to be performed as secondary tasks while grasping another object. The proposed taxonomy has been elicited by experts according to ergonomics criteria, considering three different types of grasps: palm (a wrap grasp around the steering wheel), pad (a precision grasp for a credit card) and side (a dynamic tripod grasp for writing with a pen). The
experts found 17 different gestures for the palm grasp, and 2 gestures for the pad and the side grasp.
Additionally, each gesture can be performed with different fingers and some gestures can be performed with different acceleration and duration (obtaining a touch, a tab, or a press, according to these parameters).
Figure 17. Wolf et al.'s [231] taxonomy of microinteractions for palm grasp (A), pad grasp (B) and side grasp (C).
Wimmer [228] proposed a framework for grasp, which could be used either for implicit (e.g., activity recognition) or for explicit (e.g., gesture recognition) interaction. Wimmer considered grasps as static and stable hand configurations on an object, ignoring all the complex hand and finger movement that are necessary to reach this configuration. The proposed GRASP model aims at describing the meaning of a grasp through five elements: Goal, Relationship, Anatomy, Setting, Properties. Goal deals with the purpose of the grasp from a user perspective: a grasp could be performed either with a primary purpose or for a supportive purpose within another task and could be used to communicate either implicitly or explicitly with computer. Relationship addresses all the non-physical properties of the user and the object to grasp, i.e., the personal beliefs towards an object.
Anatomy relates to the particular physical properties of the user’s body, which can differ from person to person. Finally, setting and properties relate to how the object is positioned in the physical context and which physical properties it spots (surface texture, size, form, etc.).
Figure 18. Wimmer's GRASP model [228].
Valdes et al. [209] proposed a taxonomy for tangible gestures with active tokens. Using Sifteo cubes, small interactive cubes with a touchscreen and inertial sensing, they conducted a gesture elicitation study to understand which gestures users would use to make queries in a large database through interactive tokens. The taxonomy classifies gestures with active tokens around three dimensions: space (of the interaction), flow (continuous or discrete) and cardinality (number of hands and tokens used for gesturing). They noticed that most interactions with active tokens were performed leaving the tokens on the surface of a table (or interactive), while about one third were performed in air. Similarly, two third of gestures were intended for discrete commands while for the remaining third the users expected continuous feedback. Finally, they noticed that the 85% percent of gestures was executed with only one hand and with only one token, while only few gestures exploited bimanual interaction or the combination of two or more active tokens.
2.4.2 Designing Frameworks
Some design guidelines for tangible gesture interaction have been found in the frameworks presented in 2.4.1. All design guidelines found in the literature review were basic and often specific for the particular type of tangible gesture.
Hoven and Mazalek [213] provided general design guidelines for tangible gesture interaction.
Their first suggestion is to fit the design of tangible gesture interaction to the particular context of use, according to the particular application domain and the related target users. According to the application domain, the users could require more or less accuracy in the gesture recognition, which influences the choice of the technology for detecting and recognizing gestures. Because of the novelty of the domain, Hoven and Mazalek recommended to explore new solutions and new different contexts of use.
Wolf et al. [231] guidelines are specifically oriented to design microgestures for secondary tasks, exploiting the free attentional and motor resources of the user while performing less demanding or automated primary tasks, such as driving. The provided taxonomy helps the designer to choose gesture according to ergonomics consideration, while the choice of technology for gesture recognition should be adapted to the different types of gestures, suggesting EMG for pressing gestures and accelerometers for tapping.
Valdes et al. [209] discussed some implications for the design of gesture interaction with active tokens obtained from the gesture elicitation study. They suggested exploiting interaction beyond the surface on which tokens are placed (often a multitouch interactive table for data visualization), as well as providing continuous feedback for some of the interactions with the data through active tokens. In
relation to tangible gesture design, they suggested reusing gestures across multiple commands to facilitate learnability and memorability of gestures. Moreover, they suggested designing custom active tokens to provide physical affordances, metaphors and constraints to the users. In a later work, Okerlund et al. [152] attached physical objects to the active tokens and discovered that although users had more difficulty to figure out the interaction compared to intangible representations in the token, they engaged longer in the interaction, with a better understanding of the task (biological experiments in a museum exhibition).
2.4.3 Building Frameworks
Building frameworks for tangible gesture interaction are specific for the different types of gestures.
Wimmer’s GRASP framework [228] provides directives for building grasp sensing systems. Three steps are required: capture, identification and interpretation. Different techniques allow capturing grasps. The user hand can be tracked with computer vision techniques to estimate the 3D joints of the hand, although often it is difficult to assess whether there is contact between the hand and the object.
Data gloves with pressure sensors on the fingertips allow estimating also force contacts with the objects, but reduce the sensitivity of user’s tactile perception. RFID readers attached to the user’s wrist allow to detect when the user hold an object in the hand but do not allow determining the particular grasp. In alternative, the surface of the object could be instrumented with capacitive sensors or pressure sensors. In both latter cases, the capture step has the purpose to obtain the grasp signature, i.e., the digital representation of contact points between the hand and the object or the digit positions. The computer should classify the signature using either heuristics or machine learning. Finally, once grasps are identified, they should be interpreted by the system to produce some output.
Ferscha et al. [64] proposed a framework to recognize orientation gestures with artifacts.
Exploiting an accelerometer that can be attached to the user’s hand or to an artifact, the authors proposed a gesture library to recognize three types of gestures (32 gestures in total): orientation gestures of the user’s hand, orientation gestures of a small artifact that can be held in the hand, orientation gestures with larger artifacts, which are manipulated occasionally by the users.
Klompmaker et al. [118] proposed dSensingNI, a framework for detecting interactions with object based on depth sensing (Kinect). The framework is able to detect finger touches on arbitrary surfaces (including objects); finger, hand and arm gestures; arbitrary physical objects; object-object interactions (such as grouping and stacking) and hand-object interactions (such as moving, grasping and releasing). Moreover, the framework is robust to occlusion caused by hands over the objects.