Output Selection using Semantic Relatedness

In this section we outline the output selection process that was implemented in the Toy orig- inally. We then describe the Semantic Relatedness Selection Mechanism, by focusing on the areas of the original process where our module is incorporated.

7.2.1 The [Wong et al., 2012b] Selection Mechanism

The Intelligent Interactive Toy, or simply the Toy, is a joint project between RMIT Univer- sity and Realthing Entertainment Pty Ltd to produce a module-based conversational agent, which can be embodied into a physical toy. The target audience for the Toy is children aged 8 to 12 years. The Toy can interact with users via conversational activities, ranging from structured activities such as storytelling and playing games, to unbounded “chatty” dialogue

like responding to questions or talking about any topic that its knowledge units handle. The Toy’s capabilities for pre-processing user inputs use shallow natural language processing techniques, while its responses are built from a pre-defined pool of conversational fragments. More details on the system architecture can be found in Section 2.1.1 and elsewhere [Adam et al., 2010a;b; Wong et al., 2012a;b;c].

The process of output selection in the Toy is currently performed in real-time after the system processes a user input. Output selection is based on keyword matching, word frequency statistics, and the edit distance [Levenshtein, 1966] between the user input and each candidate fragment (see Section 2.1.1). As we show below, this mechanism is very fast at choosing an output, taking on average 0.12 seconds to select a fragment. However, it is mostly reactive towards the user input and depends on terms that have been mentioned in earlier turns of a conversation [Wong et al., 2012b]. Moreover, we want to make the Toy able of driving coherent conversations.

The Conversation Manager, which was described in Section 2.1.1, is the main component of the Toy architecture. It addresses open chat and manages execution of three main activities, hereafter referred to as the Wong et al. [2012b] Selection Mechanism:

1. User Input Analysis: in this task, the Toy generates or enriches a context with the set of topics in a conversation by performing a shallow linguistic analysis over the user input.

2. Candidate Selection: this task consists of retrieving a set of candidate outputs from the pool of conversational fragments using a ranking technique involving the context. 3. Candidate Scoring and Retrieval : in this task, candidates are scored using edit distance

[Levenshtein, 1966] between the user input and each candidate contrastive component. The candidate contrastive component can be either the question or answer in a QA- fragment, depending on the selection made by the system to respond to the user. The candidate contrastive component with the lowest edit distance (i.e. the most similar to the input) is selected, and the other component (either the question or the answer) is processed to be used as an output.

This process is shown in Figure 7.3. Refer to [Wong et al., 2012b] for more details on the procedure.

Recall that the conversational fragments used by the Toy were mined from the web in the form of Question-Answer pairs (QA-fragments) [Wong et al., 2012b]. These pairs address the problem of content construction found for the conversational fragments used in Chapter 4, providing module designers with a pool of potential system utterances to address user inputs instead of having to manually construct them.

7.2.2 The Semantic Relatedness Selection Mechanism

Based on the findings of Chapter 3, we propose extending the output selection process by calculating the semantic relatedness between concepts contained in both the conversation history and candidate outputs. This mechanism, labelled the Semantic Relatedness Selection Mechanism, modifies aspects of the implementation of the Wong et al. [2012b] Selection Mechanism:

1. A process of context expansion based on semantic relatedness: this process can be used to increase the quality of a set of candidate outputs for finding alternative conversation topics. Using this expansion, we aim to avoid cases where keywords are recognised by the system but no conversational fragment is associated with these keywords.

2. Assessing semantic relatedness between the conversational context and candidate outputs: as in the experiment conducted in Chapter 3, this aspect measures the semantic relatedness between the conversational topics appearing in both the candidate outputs and current context. This analysis extends the current implementation that only considers context terms’ occurrences in the candidate outputs and the edit distance between the user input and each candidate output.

Figure 7.3 shows the process performed in the original version of the Toy and how these tasks fit in this process. It must also be acknowledged that, for point 2 above, there exists a risk of creating a combinatorial set for which semantic relatedness must be calculated. This is a potential risk with respect to the need for the conversational agent to respond in a reasonable amount of time (of about three seconds). We describe these areas in detail below.

7.2.2.1 Context Expansion

Users interact with the conversational system via questions or assertions. Depending on the input, the Toy determines the course of the interaction and how the output is processed [Wong et al., 2012b]. Topics detected in the user input help in determining the topics in the output. Having a context expansion mechanism helps in two cases: first, it can expand the selection of related topics when only one major topic is mentioned by a user; second, it can divert the conversation when there are no conversational fragments about the main topic (for instance, when all of these have been exhausted).

This addition of concepts can also enrich the context with additional topics that the user may pursue once the Toy mentions them. For example, rather than responding with an output only about “lions”, the system can respond to the user with the following output: “Can a lion run faster than a tiger?”. This helps the Toy introducing novelty and surprise into a conversation.

User Input Analysis User Input System Output Candidate Selection 1 Context Expansion Semantic Relatedness Assessment Candidate Scoring and Retrieval 2 3

Figure 7.3: The current process conducted by the Toy is enumerated in red, while the extended steps of the Semantic Relatedness Module are shown in yellow.

We show an example of context expansion using the main topic lion in Figure 7.4. In the example shown, topics related to the main topic lion are scored in terms of their semantic relatedness using the learned hybrid metric from Section 6.7. This means that the system determines their relatedness by analysing their conceptualisations in the corresponding M- Onto, in this case the Zoo M-Onto.

This context expansion is triggered if the Toy only detects one conversational topic in the user input, or if below some maximum threshold n which is a parameter set in the Toy configuration.

7.2.2.2 Output Selection Using Semantic Relatedness

This mechanism uses the measure proposed by Lapata and Barzilay [2005] from the semantic view of coherent texts (Equation 7.1), where the measure of semantic relatedness is replaced by the learned hybrid metric from Section 6.7. The Toy selects an output from a set of 20 candidate QA-fragments previously selected by the Wong et al. [2012b] Selection Mechanism, ranked in terms of the similarity between one of its components (either the question or the answer, which we refer to as input candidate and output candidate for this purpose) with

Is-a relation Wikilink relationship Big cat Lioness Elephant Lion Liger Lionet Animal Hyena Leopard Giraffe Antelope Zebra Cheetah SemRel(x, y) score lion leopard 1.88949 cheetah 1.63414 hyena 1.47294 elephant 1.41098 lioness 1.39421 *** THRESHOLD *** zebra 1.35889 antelope 1.33788 giraffe 1.32122 . . .

Figure 7.4: Context expansion using the input term “lion”. The dotted line in the table represents the threshold for selected topics.

respect to the user input. The result of Equation 7.1 is applied to both the input and output candidate in the QA-fragment, and then included in the following ranking equation [Wong et al., 2012b]:

score(Tr) =

δ × (score(Ir) × if+ score(or) × of+ dif f (Ir, ι) × dif ff) +

(0.5 × RelI(κ, Tr)) + (0.5 × RelO(κ, Tr)),

(7.2)

where score(Ir) and score(Or) represent the overlap of words contained by the input

and output candidate sentences with respect to the context of the conversation, respectively. These scores are ranked using the tf×idf statistic. The component dif f (Ir, ι) stands for the

string similarity (using the edit distance [Levenshtein, 1966]) between the input candidate sentence and the last user input. The components RelI and RelOcorrespond to the semantic

Setting Average time

Standard deviation

Minimum time Maximum time

TOY 0.12 0.04 0.04 0.22

TOY + SR 66.01 160.49 3.09 840.69

Table 7.1: Execution time (in seconds) measured for 50 turns with an output retrieved, for the original and the extended Toy settings.

In document Domain-sensitive topic management in a modular conversational agent framework (Page 184-189)