A framework for context-aware gestural interaction

Chapter 3: Context-Aware Gestural Interaction

3.2 A framework for context-aware gestural interaction

The literature review showed how the context awareness is treated in the development of a system in the ubiquitous computing era. In particular, the application of the contextual information changes with reference to the system layer. In the human-computer interaction, the context-aware services are of great value because they can enhance the user experience in many aspects. Using this information can be treacherous and a framework could improve the computer scientists working in the HCI field to create new systems that can better exploit the contextual information to improve the user interface. As first step, it was necessary to choose the best approach for the representation of the contextual information. The four standard approaches for context modeling and reasoning that are the protagonists of the current state-of-the-art:

 Object-Role Modeling, in particular Context Modeling Language

 Spatial models

 Ontology-based model

 Hybrid approach

Since the goal was to create a framework for the design and development of an interface for the human-smart environment interaction, the spatial model was the obvious choice. In fact, the spatial reference plays a key role in the interaction between humans and the environment. The context information can be gathered from the spatial representation of the entities that are present in the environment. The entities are divided in two types: the humans and the smart objects. The spatial relationship between these entities can characterize the interaction. The spatial model is composed of different levels and these levels can provide different information about the five categories that describe the entities and their contexts. The scalability of space model can be expressed as geographical position and relative location:

 Geographical position: country (social-ethnographic information, if in his country with historical data, if in holyday, hour).

 Geographical position: city (information about transferring from one city to another: business travel, holidays, home et cetera).

 Geographical position: specific building (at work, at home, at gym et cetera).

 Relative location: specific room referring to the specific building (kitchen, living room, bedroom et cetera).

 Relative location: coordinates referring to the specific room.

The spatial relationship between the different entities that are present in the same room refers to the concept of proxemics, which derives from the psychology; E. T. Hall gave the definition of proxemics as “the study of how man unconsciously structures microspace—the distance between men in the conduct of daily transactions, the organization of space in his houses and buildings, and ultimately the layout of his towns” (Hall, 1963). In (Vogel & Balakrishnan, 2004), the authors elaborated this concept applying it to the public displays. They took Hall’s interpersonal interaction zones: intimate (less than 1.5 feet), personal (1.5 to 4 feet), social (4 to 12 feet), and public (12 to 25 feet). Then, they characterized the human-display interaction in the same manner. Some years later, Greenberg et al. presented a model based on proxemics theory for the interaction between people and displays, extending Vogel and Balakrishnan’s work (Greenberg et al., 2011). The difference is that in this proxemics ecology, the displays are composed of digital surfaces, portable personal devices and information appliances. This scenario is much more similar to Weiser’s scenario of ubiquitous computing interface. In order to model the proxemics for this ubiquitous computing interface, they introduced five dimensions to characterize the entities and to describe the interaction. These dimensions are: distance, orientation, movement, identity and location. However, this scenario excluded the possibility of interaction through air gestures and tangible gestures. Moreover, some of these measures are not actually sensed but they can be calculated from the other measures. For instance, the distance measure can be calculated using the positions of the different entities. For this reason the dimensions acquired from the sensors can be transformed in: orientation, movement, identity, location, time and state. The orientation and location describe the positions of the different entities in the environment. The identity

and the status dimensions provide some information about the characterization of the specific entities. The time is important especially for the construction of a history of the interaction and for the habits. The movement dimension refers to the Oxford English Dictionary definition: “The power or facility of voluntary movement of a part of the body.” In particular, this dimension represents the movements performed by the user, i.e., gestures, or the moving parts of a smart object.

These dimensions represent the data acquired by the sensors and they must be elaborated to obtain a higher level of the understanding of the context. The different levels of understanding can be classified as in the Data Information Knowledge Wisdom (DIKW) pyramid (Rowley, 2007), see Figure 3.1.

Figure 3.1 The Data Information Knowledge Wisdom (DIKW) pyramid. The data are raw; they represent facts and have no significance beyond their existence. The information is elaborated data: the data have been given meaning by way of relational connection and the information embodies this relationship. The knowledge is the appropriate collection of information, such that its intent is to be useful. The information has been elaborated in order to find patterns, which provide a high level of predictability as to what is described or what will happen next. The wisdom is more complex. It is an extrapolative process and it calls upon all the previous levels of knowledge and consciousness. Wisdom

embodies more of an understanding of fundamental principles embodied within the knowledge that are essentially the basis for the knowledge being what it is. Bellinger et al. gave this description and the following graph is their graphical interpretation of the links between the different layers (Bellinger et al., 2004). Figure 3.2 represents the different levels of information elaboration associated to the DIKW layers, image adapted from (Bellinger et al., 2004).

Figure 3.2 information elaboraton with reference to the DIKW pyramid. The six dimensions for the representation of the gestural interaction in a smart environment are in the lowest layer of the pyramid. On the other layers, the data can be elaborated using different relations and principles. The choice of such parameters depends on the application. The spatial model for the characterization of the gestural interaction is important and a first measure that can be calculated from the dimensions is the distance between the entities. In fact, other works suggested to model the area around the user as spatial region where it is possible to manipulate objects (Surie et al., 2007; Surie et al., 2010). The objects that are in the reach of the user can be manipulated; if these objects are augmented ones, then they integrate computational capabilities, allowing the user to interact with them through contact gestures. For instance, a user can

interact with smart surfaces through touch gestures and with smart artifacts through tangible gestures. The research showed that another way of interacting with the environment, i.e., with the smart objects, it is also through distance gestures, which are performed in the air. The problem associated to this kind of distance interaction concerns the identification of the target of the gesture. One solution could be creating a gesture taxonomy that presents a set of gestures for each smart object. In this case, every command is associated to a different gesture. This approach makes the number of gestures to increase with the number of possible commands. Increasing the number of smart objects augments the number of gestures that a user has to learn, which represents a problem in terms of memorability and cognitive load. The opportune use of the contextual information allows the adoption of a second approach. In this case, the elaboration of the data of the six dimensions can provide adequate information to understand the user’s intention and focus. In fact, other researchers already emphasized the issue and the importance related to identifying the user’s focus of attention for natural interaction in smart environments (Shafer et al., 2001). This information can help to disambiguate the aim and the meaning of a specific gesture performed in a determined situation. For example, the orientation and the distance of the object can determine the target of the command (Figure 3.3).

Figure 3.3 Example of spatial model for the human-environment interaction. In Figure 3, the spatial model allows the representation of the user in the environment and the orientation and the distance allows understanding the focus of the general attention. The movement, in this case a deictic gesture, can help the system to identify the specific target of a command. The possibility of identifying the user’s focus brought to the creation of a novel concept called functional gestures.

In document Context-aware gestural interaction in the smart environments of the ubiquitous computing era (Page 76-81)