• No results found

3.2 Speech versus Gestural Interaction

3.2.4 Discussion

For navigation, our hypothesis in favor of gesture input was confirmed by its higher usage and stated preference, as well as the fact that gestures were rated as easier to learn. This was in line with Kadobayashi et al. [82] who considered walking gestures to be more intuitive for navigation than using a mouse. However, there might be different results when using a navigation approach with direct target selection.

For the selection task, we found no significant differences, only the stated preferences indicate a tendency for speech. One reason might be distinc- tions between the selection targets, as three participants mentioned that they liked to reach for an object with their hands whereas two preferred addressing characters by speech. Different sizes and placements of the ob- jects might have further influenced the modality choice, as some objects were more difficult to point at than others, similarly observed by van der Sluis and Krahmer [178].

The hypothesis that speech would be preferred for dialogue as derived from Cavazza et al. [25] was clearly confirmed. All participants named it as their preferred modality, it was used most of the time with nine par- ticipants even using it for every single sentence, and the user ratings were very positive with all items close to the extremes. Apart from this clear result, it has to be mentioned that there exist dialogue utterances that can be naturally represented by gestures, e.g. nodding for “yes” or a greeting gesture for “hello”, but this is not the case for arbitrary sentences.

We assumed a preference of gestures for the object manipulation task, but this hypothesis could not be confirmed as both modalities were used with almost equal preference and the user ratings were even in favor of speech. A similar variety of modalities was observed by Corradini and Cohen [32], who additionally reported that users preferred to use both, gestures and speech, in a multimodal way.

Hints for another explanation could be found in the study, as the users seemed to follow two different behavior patterns. Speech users seemed to be more focused on progressing, often calling the actions as soon as they appeared on screen, instead of first watching the gesture animations to

figure out how to perform them. On the other hand, gesture users seemed to perform the task in a consciously more natural way and some also exhibited role-playing behavior such as worrying about being heard by the virtual characters. Therefore, interaction designers should investigate their target group’s preferences and decide between a more natural and engaging object manipulation using gestures or a faster one using short speech commands.

3.2.5

Conclusion

In this section, I examined which modality users preferred regarding four main interaction tasks in a virtual environment. We conducted a study on a system in which we successfully implemented all four interaction tasks with real-time recognition for both speech and body gesture input using low-cost technology. It was confirmed that a gestural walking metaphor suits naviga- tional tasks better while speech was chosen for dialogues. For selection and manipulation, no clear preference was obtained, but we observed possible reasons for the different modality choices between the users. While speech is usually faster to perform for discrete interactions as selecting an item or a sentence. Gestural interaction is better suited for continuous interaction in which ongoing inputs are needed within a certain duration to adjust the wanted outcome, as for navigation. Nevertheless, gestural interaction also offers advantages for discrete interaction tasks, as not in all scenarios, it is possible or wanted to use speech, e.g. in noisy environments, for privacy reasons, or when the bodily activity is part of the application’s targets as in fitness games.

User-Defined Full Body Gestures

In the applications of the last chapter, the gestures were chosen by the de- veloper of the system according to his or her own preferences. Nevertheless, it might be that they were not the most intuitive ones for the actual users. The goal of this chapter is to better integrate the user in the design pro- cess. Therefore, I adopt and modify the process by Wobbrock et al. [189] as described in Section 2.5. In opposite to Wobbrock et al. who investigated surface gestures, my goal is to identify intuitive gestures for applications with full body interaction. I use their definition to calculate an agreement score. However, I enhance their process (see Section 2.5.1) for finding ges- ture candidates by allowing multiple levels of candidates. Therefore, I do not only look at the largest subset of identical gestures Mi(a), but I propose

to order all of the subsets for getting alternative candidates in the case the first candidate cannot be used, e.g. for technical reasons. In this way, I define multiple gesture candidates cj in the following way:

cj(a) = MAXi∈1..na,Mi(a)≠ck(a)∀k<j(Mi(a))

As not all alternative gesture candidates cj are similarly often repre-

sented in the set M(a), I propose that an alternative candidate should only be taken if its size is not much smaller than the size of the first candidate, e.g. one could define that an alternative is only taken into account if its

size is at least half the size of the first candidate.

Before applying the design process, I will develop an own taxonomy for full body interaction in Section4.1. The taxonomy will be used for catego- rizing gestures in the two preceding sections that present two cases, in which we completely went through the design process for creating user-defined full body gestures. The first study creates a gesture set for controlling a humanoid robot, and the second study investigates input gestures for the intercultural training system Traveller.

4.1

Taxonomy for Full Body Interaction

As already could be seen in the preceeding sections, full body interaction offers many possibilities for interacting with a computer dependent on which body parts are used, whether it is discrete or continuous, how the inputs are interpreted, what effect they have within the system, etc. In the next section I will describe different types of full body interaction that are investigated in this thesis. Afterwards, I will develop an own taxonomy of full body gestures, as existing ones are not perfectly suiting (cf. Section 2.2).