Computational approaches to dialogue modelling

The aim of this section is to explore the computational approaches to dialogue modelling. Dialogue modelling can be defined as a branch of knowledge that investigates the structure and the process of dialogue in HCI (Jokinen, 2009, Bel-Enguix et al, 2009). Computational studies provide the foundation for building such a model that emulates the performance of human beings (Ginzburg and Fernandez, 2013). As we noticed earlier in this chapter, various linguistic theories contributed to my understanding of the structure and the process of conversations. Consequently, I found different approaches for modelling dialogues. There are three main approaches for building and designing a model of dialogue: a) grammar-based approach b) plan-based approach c) collaborative or joint action approach (Cohen, 1997, Kshirsagar et al, 2005, Josefa et al, 2006, Calking et al, 2007, Jokinen, 2009, Pietquin, 2004 and 2009, Moller, 2006). These approaches along with the advantages and disadvantages of each one are explored in the following sections.

Grammar-based approach

A dialogue grammar approach focuses on the rules that govern the mechanism and the structure of dialogue. These rules can be observed by exploring the “sequencing regularities in dialogue” (Cohen, 1997, p.253). The notion of this approach is similar to Chomsky’s theory of the rules or the grammars that govern the structure of the sentence (Sinclair and Coulthard, 1975, Cohen, 1997, Kshirsagar et al, 2005). A grammar-based approach is based on the descriptive system of discourse units proposed by Sinclair and Coulthard (1975), Coulthard and Brazil (1992), and Stubbs (1983), and on turn-taking and adjacency pairs theories suggested by (Schegloff and Sacks, 1973, Sacks et al, 1974). Identifying the sequence of utterances in a dialogue leads to describing and modelling the whole structure of the dialogue from the start to the end (Kshirsagar et al, 2005). Modelling dialogue based on grammars requires terminal and non-

terminal elements (Cohen, 1997). While exchange structure, e.g. initiate, re-initiate and respond, and adjacency pairs, (question/answer and greeting/greeting), describe the high level of dialogue structure or the non-terminal elements, conversational acts (Searle, 1969 and 1971) describe the lowest level of interactions or the terminal elements.

Initiation: What is your name? (Question) Response: My name is Botocrates (Answer)

Example 20: A simple dialogue grammar

A dialogue grammar approach could be useful for modelling a simple dialogue for a well- structured task (Moller, 2006, Jokinen, 2009). Most of the dialogue models based on this approach are simple because the structure of the dialogues has fixed rules of pairing dialogue acts (Pietquin, 2004). In the case where the dialogue is more elaborate, it is difficult for the grammar rules or the transition states to deal with different situations and act appropriately (Mozgovoy, 2009, Jokinen, 2010). Among the practical concerns in a dialogue grammar model is to what extent the system could be based on clear criteria regarding how a system should correctly select a certain act for the next move (Cohen, 1997). To build a model of dialogue that communicates with users in natural language, the model must be able to deal with any utterances and take into account any miscommunication that may occur (Jokinen, 2009, Frederking, 2012). A grammar-based approach is not suitable for dealing with such situations that enable the system to take control of a complex dialogue structure and implement it accurately (Jokinen, 2010).

Plan-based approach

A plan-based approach is based on the assumption that an interlocutor has a particular intention which is to achieve a certain goal while the listener should discover this goal (Cohen and Perrault, 1979, Allen and Perrault, 1979). This approach to modelling dialogue is not only concerned with the direct goal but also with the potential hidden plan or the so-called “sub- goal” (Moller, 2006, p.28). A plan-based approach does not rely only on the semantic features of utterances but rather this approach pays more attention to the pragmatic goal. In contrast to the dialogue grammar approach, this approach is based on the observation that utterances are not only a collection of words (Cohen, 1997).

 People are rational agents who are capable of forming and executing plans to achieve their goals.

 They are often capable of inferring the plans of other agents from observing that agent perform some action

 They are capable of detecting obstacles in another agent's plans. (Allen and Perrault, 1979, p.3)

Example 21 illustrates the notion of this approach: Abdul: Is Aisha’s room in this building?

The receptionist: First floor, Room G.10.

Example 21: Recognizing the speaker’s plan

Even though it is yes/no question, if the receptionist answer was ‘yes’, it would be unsuitable (Allen and Perrault, 1979), because it would flout the maxim of quantity as suggested by Grice’s Maxims. The receptionist was able to realize the plan of the speaker and the obstacle in this plan, which was not knowing the location of Aisha’s room. According to this approach, if the model was able to recognise speakers’ plans, it could deal with indirect speech acts (Litman, 1985, Kshirsagar et al, 2005). Inferring the sub-goal of the speakers can be achieved by considering the context of the plan (Pietquin, 2009). This approach is more efficient than a dialogue grammar approach if we can ensure that the plan of the speakers correctly matches the listeners’ plan (Moller, 2006, Pietquin, 2009). Therefore, the dialogue model based on this approach requires a restrictive context (Allen and Perrault, 1979).

In restrictive domains, such as the train station, identifying the fundamental goal (i.e. boarding, meeting) is sufficient to identify the subgoals desired. In such settings, very brief fragments can be used successfully.

(Allen and Perrault, 1979, p.56)

However, creating an agent based on a plan based approach is very complex because it requires a dynamic process of detection (Hong and Cho, 2003, Pietquin, 2009). Dynamic detection implies a plan schema and meta-plans (plans regarding a certain plan) (Litman, 1985). The plan schema includes an action schema which consists of preconditions and effects. The preconditions can be described as conditions that must be achieved before applying a speech act and the effects refer to the conditions that become true after implementing the act (Allen and Perrault, 1979, Suchman, 2007, Stent and Bangalore, 2014).

Collaborative or joint action approach

A collaborative or joint activity approach is based on the observation that both parties in a dialogue have responsibilities to sustain and feed the dialogue (Cohen, 1997, Josefa et al, 2006). As opposed to the previous approaches (grammar-based and plan-based) to dialogue modelling, this approach gives emphasis to the importance of clarification and confirmation and the mutual understanding between partners in dialogues (Kshirsagar et al, 2005), which are essential components of human behaviours in interactive dialogues (Cohen, 1997). In joint collaborative activities, the success of the interactions relies on an appropriate coordination between participants’ actions (Clark and Schaefer, 1989). Lochbaum (1993) suggests that the dialogue should not be viewed as merely a fixed structure in which a turn is followed by another turn. In interactive dialogue, especially in task-oriented dialogues, we have to consider the sub- dialogue or the segments that happen during the dialogue to ensure a better success for the task (ibid).

Closer analyses of face to face communication indicate that conversation is not so much an alternating series of action and reactions between individuals as it is a joint action accomplished through participants’ continuous engagement in speaking and listening.

(Suchman, 2007, p.87)

Indeed, contributions in dialogues are affected by participants’ experience which is a part of the “baggage e.g. prior beliefs, assumptions and other information” that they carry with them (Clark and Schaefer, 1989, p.260). Inside this baggage, there are some common grounds that facilitate the mutual understandings in a dialogue (Stalnaker, 1978 and 2004). Jurafsky and Martin (2009) suggest that confirmation and clarification moves and acknowledging the speakers are seen as parts of the process of establishing the common grounds between interlocutors. This process in turn facilitates the success of the task and performs the accurate actions (Jurafsky and Martin, 2009, Zacarias and De Oliveira, 2012). Table 9 below illustrates types of evidence of understanding proposed by Clark and Schaefer (1989).

No Type of Evidence Descriptions

1 Continued attention B shows she is continuing to attend and therefore remains satisfied with A’s presentation.

2 Initiation of the relevant next contribution

B starts in on the next contribution that would be relevant at a level as high as the current one.

3 Acknowledgement B nods or says “uh huh,” “yeah,” or the like. 4 Demonstration B demonstrates all or part of what he has understood

A to mean.

5 Display B displays verbatim all or part of A’s presentation Table 9: Types of Evidence of understanding between participants

(Clark and Schaefer, 1989, p.267)

In addition, a collaborative or joint activity approach requires understanding individuals’ motivations behind the interactions (Moller, 2006). Understanding these motivations of participants and their beliefs leads to better identification and specification of the model and its structure. This approach employs different concepts from both grammar and plan based approaches (Moller, 2006, Shi et al, 2010, Pietquin, 2004 and 2009). Based on joint action theories, the prediction of the mechanism between parties in dialogue leads to satisfying and achieving their goals (Dino and Chella, 2013). The goals of participants in a dialogue can be achieved by modelling and combining the mutual intentions in collaborative dialogue (Lochbaum, 1993). However, this approach needs a high degree of natural language processing (Moller, 2006).

It could be said the purpose of the interaction and its complexity play a crucial role in determining a specific approach to modelling dialogue (Cohen, 1997). The process of argumentation is not a rigid and fixed structure because different conversational behaviours may occur. Walton (2006) suggests that the word ‘argumentation’ refers to the dynamic process which occurs during the interactive dialogue between two individuals. The sequence of moves that controls the flow of the dialogue is shaped and moulded by both parties (Botocrates and the user). Walton (2007) states that:

Argumentation is seen as a dynamic process in which one party puts forward an argument that may change and develops as it is confronted in a dialog with the question, doubts and criticisms of another parry who may or may not accept the argument.

(Walton, 2007, p.1)

Argumentation is a joint activity where the two parties (Botocrates and the user) should feed and contribute to the development of the argumentation process (Anderiessen and Schwarz,

2009). A joint activity involves a set of behaviours which need to be considered and coordinated (Ahrndt and Albayrak, 2016). The set of behaviours during the interaction between users and Botocrates is represented in their conversational moves (Chen and Jokinen, 2010, Bunt, 2013b). Therefore, the main challenge of designing Botocrates’ dialogue strategies and tactics is to take into account different conversational scenarios (Nishida et al, 2014). These scenarios have to be devised to represent the dialogue algorithm that could deal with various communicative moves (Andrews and Quarteroni, 2011).

In document Modelling a conversational agent (Botocrates) for promoting critical thinking and argumentation skills (Page 104-109)