Chapter 8 • General discussion
8.2 Limits of these studies and further work
While the three studies presented here provide a description of how a coordination device such as MLA affects coordination, they are subject to limitations. Overall, there are five main limits described hereafter. The question of generalization to other settings or populations must be considered in light of these limitations.
First, there might be concerns with the type of collaboration settings we deployed in those three studies. The two games we designed only immersed small groups of people in a decentralized collaboration activity. Unlike many other collaborative tasks, there was no formal and normative procedure to solve the joint problem. In some collaborative situations (firemen, army), coordination is more controlled because for some there is a control room or a central command group that receives information and then dispatch orders to field participants. This is often the case in collaborative situations that involve spatial activity. An additional way to constrain coordination devices is the development of very precise procedures that we could describe as conventions. A further limitation to the setting for collaboration that we developed is that tasks lasted a short amount of time and as a consequence only required synchronous coordination between peers. Additionally, it is important to state that the semantics of the two tasks proposed is quite simle. For example, in CatchBob!, the spatial nature of the game makes it very simple and based on spatial reasoning: it is therefore less possible to draw complex inferences between the environment (spatial layer) and the problem space. Finally, for the first game, a potential limitation arises from the task and the nature of the experimental studies. An experimental session held in a computer lab or in a quasi controlled physical environment is indeed not likely to represent a “real” experience and the computer games we designed for the manipulation was necessarily
limited in scope. While this potential limitation exists in the second experiment, the interviews after CatchBob! revealed that participants believed they were participating and evaluating a real game. We would definitely benefit from the investigation of the same issues during a more ecological task such as firefighter missions or police investigations for instance. In these contexts, collaboration would be bound by complex inferences about the spatial environment based on people’s norms and procedure as well as their socio-cultural background and a richer common ground would arise.
A second limit is the sample of participants who participated. The degree of familiarity of players within groups was vaguely controlled in the two experiments. In the first experiment, people did not know each other, whereas the contrary was true in the CatchBob! studies. Furthermore, in this game, we also controlled the fact that people knew the physical environment since they were all students of the school. Such a controlled sampling diminished the ecological validity of the task since in real settings the relationships between people are more mixed and the level of knowledge regarding the physical environment should also be more diverse. It would then be interesting to conduct field experiment with mixed groups who have different degrees of familiarity with each other and with the environment.
Thirdly, in these three experiments, each group only played one game, which might be an issue in terms of learning the different elements of the situation: the interface, the task and its rules. One possible response to see whether the results still hold over time is repeated play as described in Barkhuus et al. (2005) or a crossed experiment in which players from one condition play a second game in the other condition. We could then imagine making participants play multiple sessions of Spaceminers or CatchBob! with different intervals of times between the game sessions so that we could see how results hold over time.
The fourth main limit refers to the methodological choices we made in order to understand mutual modeling and coordination. In the first study, the measure of mutual modeling accuracy was made during the task and through a simple and subjective questionnaire. In the second and third studies, we used a different indicator, namely the number of mistakes made about the partners’ spatial behavior. Both measures are different, which does not allow the comparison between the three experiments, and have shortcomings. Having mutual modeling measured during the game can be disruptive or can trigger a modeling process that could alter the natural modeling process. Evaluation of mutual modeling after the game may imply mnemonic and rationalization biases ; it can also make the researcher evaluate recall rather than in- task modeling. In other words, the abstract and unobservable characteristics of the mutual modeling process imply methodological challenges that call for indirect measures and assessment methods. Furthermore, mutual modeling in everyday life involves a large variety of mental states to be represented such as knowledge, behaviors, beliefs, desires, intentions, emotions, traits, attitudes, etc. Three of these mental states are particularly relevant in collaborative learning situations, namely inferences about partners’ knowledge, behavior, goals (intentions). Study 1 focused essentially on inferences about partners’ intentions by using an ‘on-task’ questionnaire. In Study 2 we investigated inferences about partners’ behavior by using an ‘after task’ assessment method. We should use a more objective method to evaluate this variable. That is the reason why future work should be directed towards finding a solution to compare what player A says B is going to do with what B really does during the game. This solution might allow researchers to benefit from both approaches.
The final limit is our testing of awareness tools with pairs or triplets; in the context of multi-user systems with 4 to 50 users, the use of awareness tools should change. Paying attention to awareness cues left by 50 users would be more complicated and difficult than testing those left by groups of two or three, but more study regarding the use of awareness tools by large groups of people is needed.