• No results found

Towards Improving Dialogue Topic Tracking Performances with Wikification of Concept Mentions


Academic year: 2020

Share "Towards Improving Dialogue Topic Tracking Performances with Wikification of Concept Mentions"


Loading.... (view fulltext now)

Full text


Towards Improving Dialogue Topic Tracking Performances

with Wikification of Concept Mentions

Seokhwan Kim, Rafael E. Banchs, Haizhou Li Human Language Technology Department

Institute for Infocomm Research Singapore 138632



Dialogue topic tracking aims at analyz-ing and maintainanalyz-ing topic transitions in on-going dialogues. This paper proposes to utilize Wikification-based features for providing mention-level correspondences to Wikipedia concepts for dialogue topic tracking. The experimental results show that our proposed features can signifi-cantly improve the performances of the task in mixed-initiative human-human dialogues.

1 Introduction

Dialogue topic tracking aims at detecting topic transitions and predicting topic categories in on-going dialogues which address more than a single topic. Since human communications in real-world situations tend to consist of a series of multiple topics even for a single domain, tracking dialogue topics plays a key role in analyzing human-human dialogues as well as improving the naturalness of human-machine interactions by conducting multi-topic conversations.

Some researchers (Nakata et al., 2002; Lagus and Kuusisto, 2002; Adams and Martell, 2008) at-tempted to solve this problem with text categoriza-tion approaches for the utterances in a given turn. However, these approaches can only be effective for the cases when users mention the topic-related expressions explicitly in their utterances, because the models for text categorization assume that the proper category for each textual unit can be as-signed based only on its own contents.

The other direction of dialogue topic tracking made use of external knowledge sources including domain models (Roy and Subramaniam, 2006), heuristics (Young et al., 2007), and agendas (Bo-hus and Rudnicky, 2003; Lee et al., 2008). While

these knowledge-based methods have an advan-tage of dealing with system-initiative dialogues by controlling dialogue flows based on given re-sources, they have drawbacks in low flexibility to handle the user’s responses and high costs for building the resources.

Recently, we have proposed to explore domain knowledge from Wikipedia for mixed-initiative di-alogue topic tracking without significant costs for building resources (Kim et al., 2014a; Kim et al., 2014b). In these methods, a set of articles that have similar contents to a given dialogue segment are selected using vector space model. Then vari-ous types of information obtained from the articles are utilized to learn topic trackers based on kernel methods.

In this work, we focus on the following limi-tations of our former work in retrieving relevant concepts at a given turn with the term vector sim-ilarity between each pair of dialogue segment and Wikipedia article. Firstly, the contents of conver-sation could be expressed in totally different ways from the descriptions in the actual relevant articles in Wikipedia. This mismatch between spoken dia-logues and written encyclopedia could bring about inaccuracy in selecting proper Wikipedia articles as sources for domain knowledge. Secondly, a set of articles that are selected by comparing with a whole dialogue segment can be limited to reflect the multiple relevances if more than one concept are actually mentioned in the segment. Lastly, lack of semantic or discourse aspects in concept retrieval could cause a limited capability of the tracker to deal with implicitly mentioned subjects. To solve these issues, we propose to incorpo-rate Wikification (Mihalcea and Csomai, 2007) features for building dialogue topic trackers. The goal of Wikification is resolving ambiguities and variabilities of every mention in natural language by linking the expression to its relevant Wikipedia concept. Since this task is performed using not


t Speaker Utterance Topic Transition 0 Guide How can I help you? NONE→NONE 1 Tourist Can you recommend some good places to visit

in Singapore? NONE→ATTR Guide Well if you like to visit an icon of Singapore,

Merlion park will be a nice place to visit.

2 Tourist That is a symbol for your country, right? ATTR→ATTR Guide Yes, we use that to symbolise Singapore.

3 Tourist Okay. ATTR→ATTR

Guide The lion head symbolised the founding of the is-land and the fish body just symbolised the hum-ble fishing village.

4 Tourist How can I get there from Orchard Road? ATTR→TRSP

Guide You can take the red line train from Orchard and stop at Raffles Place.

5 Tourist Is this walking distance from the station to the

destination? TRSP→TRSP Guide Yes, it’ll take only ten minutes on foot.

6 Tourist Alright. TRSP→FOOD

Guide Well, you can also enjoy some seafoods at the riverside near the place.

7 Tourist What food do you have any recommendations

to try there? FOOD→FOOD Guide If you like spicy foods, you must try chilli crab

which is one of our favourite dishes here.


8 Tourist Great! I’ll try that. FOOD→FOOD

Figure 1: Examples of dialogue topic tracking on Singapore tour guide dialogues

only surface form features, but also various types of semantic and discourse aspects obtained from both given texts and Wikipedia collection, our pro-posed method utilizing the results from Wikifica-tion contributes to improve the tracking perfor-mances compared to the former approaches based on dialogue segment-level correspondences.

2 Dialogue Topic Tracking

Dialogue topic tracking can be defined as a classi-fication problem to detect where topic transitions occur and what the topic category follows after each transition. The most probable pair of topics at just before and after each turn is predicted by the following classifier:

f(xt) = (yt−1, yt),

wherextcontains the input features obtained at a

turnt,yt∈C, andCis a closed set of topic

cate-gories. If a topic transition occurs att,ytshould be

different fromyt−1. Otherwise, bothytandyt−1

have the same value.

Figure 1 shows an example of dialogue topic tracking in a given dialogue fragment on Singa-pore tour guide domain between a tourist and a guide. This conversation is divided into four seg-ments, sincef detects three topic transitions att1, t4 and t6. The mixed-initiative aspects are also

shown in this dialogue, because the first two tran-sitions are initiated by the tourist, while the other one is driven by the guide without any explicit re-quirement from the tourist. From these results, we could obtain a topic sequence of ‘Attraction’, ‘Transportation’, and ‘Food’.

t Speaker Mention Wikipedia Concept 1 Tourist Singapore Singapore

Guide Singapore Singapore Merlion park Merlion Park 2 Tourist That Merlion

your country Singapore Guide that Merlion

Singapore Singapore 4 Tourist there Merlion Park

Orchard Road Orchard Road Guide red line train North South MRT Line

Orchard Orchard MRT Station Raffles Place Raffles Place MRT Station 5 Tourist the station Raffles Place MRT Station

the destination Merlion Park 6 Guide seafoods Seafood

the riverside Singapore River the place Merlion Park 7 Tourist there Singapore River

Guide chilli crab Chilli crab here Singapore

Figure 2: Examples of Wikification on Singapore tour guide dialogues

3 Wikification of Concept Mentions in Spoken Dialogues

Wikification aims at linking mentions to the rele-vant entries in Wikipedia. As shown in the exam-ples in Figure 2 for the dialogue in Figure 1, this task is performed by dealing with co-references, ambiguities, and variabilities of the mentions.

Following most previous work on Wikifica-tion (Bunescu and Pasca, 2006; Mihalcea and Cso-mai, 2007; Milne and Witten, 2008; Dredze et al., 2010; Han and Sun, 2011; Chen and Ji, 2011), this work also takes a supervised learning to rank al-gorithm for determining the most relevant concept for each mention in transcribed utterances.

In this work, every noun phrase in a given di-alogue session is defined as a single mention. To capture more abstract concepts, we take not only named entities or base noun phrases, but also ev-ery complex or recursive noun phrase in a dialogue as the instance to be linked. For each mention, a set of candidates are retrieved from a Lucene 1

index on the whole Wikipedia collection divided by section-level. The ranking scores(m, c) for a given pair of a mentionm and its candidate con-ceptcis assigned as follows:

s(m, c) =

          

4 ifcis the exactly same asg(m),

3 ifcis the parent article ofg(m),

2 ifcbelongs to the same article but different section ofg(m),

1 otherwise.


whereg(m)is the manual annotation for the most relevant concept ofm.


Name Description

SP the speaker who spoke that mention WM wordn-grams within the surface ofm

WT wordn-grams within the title ofc

EMT whether the surface ofmis same as the title ofc

EMR whether the surface ofmis same as one of re-directions toc

MIT whether the surface ofmis a sub-string of the title ofc

TIM whether the title ofcis a sub-string of them’s surface form

MIR whether the surface ofmis a sub-string of a re-directed title toc

RIM whether a re-directed title tocis a sub-string of them’s surface form

PMT similarity score based on edit distance between the surface ofmand the title ofc

PMR maximum similarity score between the surface of

mand the redirected titles toc

OC whethercpreviously occurred in the full dialogue history

OCw whethercoccurred withinwprevious turns with


w∈ {1,3,5,10}

Table 1: List of features for training the ranking SVM model for Wikification

Then, a ranking SVM (Joachims, 2002) model, a pairwise ranking algorithm learned from the ranked lists, is trained based on the scores and the features in Table 1. In the execution time, the top-ranked item in the list of candidates scored by this model is considered as the result of Wikification for a given mention.

4 Wikification-based Features for Dialogue Topic Tracking

Following our previous work (Kim et al., 2014a; Kim et al., 2014b), the classifier f for dialogue topic tracking is trained on the labeled dataset us-ing supervised machine learnus-ing techniques.

The simplest baseline is to learn the classi-fier based on the vector space model (Salton et al., 1975) considering bag-of-words for the terms within the given utterances. An instance for each turn is represented by a weighted term vector de-fined as follows:

φ(x) = α1, α2,· · · , α|W|∈R|W|,

whereαi = Phj=0 λj·tfidf(wi, u(t−j)),utis

the utterance mentioned in a turnt,tfidf(wi, ut)

is the product of term frequency of a word wi in

ut and inverse document frequency of wi, λ is a

decay factor for giving more importance to more recent turns, |W| is the size of word dictionary, andhis the number of previous turns considered as dialogue history features.

To overcome the limitations caused by lack of semantic or domain-specific aspects in the first baseline, we previosly proposed (Kim et al., 2014b) to leverage on Wikipedia as an external knowledge source with an extended feature space defined by concatenating the concept space with the previous term vector space as follows:

φ0(x) = α

1, α2,· · ·, α|W|, β1, β2,· · · , β|D|,

where φ0(x) R|W|+|C|, β

i is the semantic

re-latedness between the inputx and the concept in thei-th Wikipedia article and|C|is the number of concepts in the Wikipedia collection. The value forβi is computed with the cosine similarity

be-tween term vectors as follows:

βi =sim(x, ci) = cos (θ) = |φφ((xx))||·φφ((cci) i)|,

whereφ(ci)is the term vector composed from the

i-th Wikipedia concept in the collection.

In this work, the results of Wikification de-scribed in Section 3 are utilized to extend the fea-ture space for training the topic tracker, instead of or in addition to the above mentioned feature val-ues obtained from dialogue segment-level analy-ses. A valueγiin the new feature space is defined

as the weighted sum of the number of mentions linked to a given conceptciwithin a dialogue

seg-ment as follows:

γi = h



λj ·m

k∈u(t−j)|g(mk) =ci ,

wheremkis thek-th mention in a given utterance

u, g(m) is the top-ranked result of Wikification for the mentionm,λis a decay factor, andhis the window size for considering dialogue history.

5 Evaluation


Schedule: All Schedule: Tourist Turns Schedule: Guide Turns

Transition Turn Transition Turn Transition Turn

Features P R F ACC P R F ACC P R F ACC

α 42.08 53.48 47.10 67.97 41.88 52.59 46.63 67.15 41.96 52.11 46.49 67.13

α, β 42.12 53.38 47.08 67.98 41.84 52.75 46.67 67.08 41.91 52.03 46.42 67.13

α, γ 47.36 50.19 48.73 72.38 46.58 51.09 48.73 71.99 47.10 48.44 47.76 71.94

α, β, γ 47.35 50.24 48.75 72.43 46.57 51.09 48.72 71.99 47.02 48.21 47.61 71.93


α, γ0 50.77 49.36 50.06 79.12 50.51 49.58 50.04 81.10 50.94 49.10 50.00 78.92 α, β, γ0 50.82 49.41 50.10 79.15 50.43 49.58 50.00 81.10 50.98 49.02 49.98 78.92

Table 2: Comparisons of the topic tracking performances with different combinations of features

For topic tracking, an instance for both train-ing and prediction of topic transition was created for every utterance in the dialogues. For each in-stancex, the term vectorφ(x)was generated with the α values from utterances within the window sizes h = 2 for the current and previous turns andh = 10for the history turns. The β values for representing the segment-level relevances were computed based on 3,155 Singapore-related arti-cles which were used in our previous work (Kim et al., 2014b).

For Wikification, all the utterance were pre-processed by Stanford CoreNLP toolkit 2, firstly.

Each noun phrase in the constituent trees provided by the parser was considered as an instance for Wikification and manually annotated with the cor-responding concept in Wikipedia. For every men-tion, we retrieved top 100 candidates from the Lucene index based on the Wikipedia database dump as of January 2015 which has 4,797,927 ar-ticles and 25,577,464 sections in total and added one more special candidate for NIL detection. Then, a ranking function using SVMrank3 was

trained on this dataset, which achieved 38.04, 31.97, and 34.74 in precision, recall, and F-measure, respectively, in the evaluation for Wik-ification for each mention-level based on five-fold cross validation. Theγ values in our proposed ap-proach were assigned based on the top-ranked re-sults from this ranking fuction for the mentions in the dialogues.

In this evaluation, the following three different schedules were applied for both training the mod-els and prediction the topic transitions: (a) taking every utterance regardless of the speaker into ac-count; (b) considering only the turns taken by the tourists; and (c) by the guides. While the first schedule aims at learning the human behaviours in topic tracking from the third person point of


3http://www.cs.cornell.edu/people/tj/svm light/svm rank.html

view, the others could show the tracking capabil-ities of the models as a sub-component in the di-alogue system which act as a guide and a tourist, respectively.

The SVM models were trained using SVMlight 4 (Joachims, 1999) with different

combinations of the features. All the evaluations were done in five-fold cross validation to the man-ual annotations with two different metrics: one is accuracy of the predicted topic label for every turn, and the other is precision/recall/F-measure for each event of topic transition occurred either in the answer or the predicted result.

Table 2 compares the performances of the fea-ture combinations for each schedule. While the dialogue segment-level β features failed to show significant improvement compared to the baseline only with term vectors, the models with our pro-posed Wikification-based featuresγ achieved bet-ter performances in both transition and turn-level evaluations for all the schedules. The further en-hancement led by the oracle features with the man-ual annotations for Wikification represented byγ0

indicates that the overall performances could be improved by refining the Wikification model.

6 Conclusions

This paper presented a dialogue topic tracking proach using Wikification-based features. This ap-proach aimed to incorporate more detailed infor-mation regarding the correspondences between a given dialogue and Wikipedia concepts. Exper-imental results show that our proposed approach helped to improve the topic tracking performances compared to the baselines. For future work, we plan to apply the kernel methods proposed in our previous work also on the feature spaces based on Wikification as well as to improve the Wikifica-tion model itself for achieving better overall per-formances in dialogue topic tracking.



P. H. Adams and C. H. Martell. 2008. Topic detection and extraction in chat. InProceedings of the 2008 IEEE International Conference on Semantic Com-puting, pages 581–588.

D. Bohus and A. Rudnicky. 2003. Ravenclaw: dia-log management using hierarchical task decomposi-tion and an expectadecomposi-tion agenda. In Proceedings of the European Conference on Speech, Communica-tion and Technology, pages 597–600.

Razvan C. Bunescu and Marius Pasca. 2006. Using Encyclopedic Knowledge for Named entity Disam-biguation. InEACL, volume 6, pages 9–16.

Zheng Chen and Heng Ji. 2011. Collaborative rank-ing: A case study on entity linking. InProceedings of the Conference on Empirical Methods in Natural Language Processing, pages 771–781.

Mark Dredze, Paul McNamee, Delip Rao, Adam Ger-ber, and Tim Finin. 2010. Entity Disambiguation for Knowledge Base Population. In Proceedings of the 23rd International Conference on Computa-tional Linguistics, COLING ’10, pages 277–285, Stroudsburg, PA, USA.

Xianpei Han and Le Sun. 2011. A generative entity-mention model for linking entities with knowledge base. InProceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Hu-man Language Technologies-Volume 1, pages 945– 954. Association for Computational Linguistics.

T. Joachims. 1999. Making large-scale SVM learn-ing practical. In B. Sch¨olkopf, C. Burges, and A. Smola, editors, Advances in Kernel Methods -Support Vector Learning, chapter 11, pages 169– 184. MIT Press, Cambridge, MA.

Thorsten Joachims. 2002. Optimizing search en-gines using clickthrough data. InProceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 133– 142.

S. Kim, R. E. Banchs, and H. Li. 2014a. A composite kernel approach for dialog topic tracking with struc-tured domain knowledge from wikipedia. In Pro-ceedings of the 52nd Annual Meeting of the Associa-tion for ComputaAssocia-tional Linguistics (Volume 2: Short Papers), pages 19–23.

S. Kim, R. E. Banchs, and H. Li. 2014b. Wikipedia-based kernels for dialogue topic tracking. In Acous-tics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on, pages 131–135.

K. Lagus and J. Kuusisto. 2002. Topic identification in natural language dialogues using neural networks. InProceedings of the 3rd SIGdial workshop on Dis-course and dialogue, pages 95–102.

C. Lee, S. Jung, and G. G. Lee. 2008. Robust dia-log management with n-best hypotheses using di-alog examples and agenda. In Proceedings of the 46th Annual Meeting of the Association for Com-putational Linguistics: Human Language Technolo-gies, pages 630–637.

Rada Mihalcea and Andras Csomai. 2007. Wikify!: linking documents to encyclopedic knowledge. In Proceedings of the sixteenth ACM conference on Conference on information and knowledge manage-ment, pages 233–242.

David Milne and Ian H. Witten. 2008. Learning to link with wikipedia. In Proceedings of the 17th ACM conference on Information and knowledge manage-ment, pages 509–518.

T. Nakata, S. Ando, and A. Okumura. 2002. Topic de-tection based on dialogue history. InProceedings of the 19th international conference on Computational linguistics (COLING), pages 1–7.

S. Roy and L. V. Subramaniam. 2006. Automatic gen-eration of domain models for call centers from noisy transcriptions. In Proceedings of COLING/ACL, pages 737–744.

G. Salton, A. Wong, and C.S. Yang. 1975. A vector space model for automatic indexing. Communica-tions of the ACM, 18(11):613–620.


Figure 1: Examples of dialogue topic tracking onSingapore tour guide dialogues
Table 1: List of features for training the rankingSVM model for Wikification
Table 2: Comparisons of the topic tracking performances with different combinations of features


Related documents

The calculated vibrational frequencies in the presence of a water molecule H-bonded to the 131 keto C=O group of Chl-a/Chl-a’ illustrate a ~25-31 cm-1 down-shift of

I would like to thank my supervisors, Prof. Wilderom and Prof. Kommers, Frank and Ben for lots of great ideas and comments. I also want to thank some of my friends who are there

In pure peer-to-peer systems the entire network consists solely of equipotent peers..

Spikes, Asymptotic decay of oscillatory solutions of forced nonlinear difference equations , Dynam..

The specific objectives of the study are to determine the effect of fashion product quality; durability; features; designs and symbolic meaning on consumers’ purchase

to have the property that the absolute value of the price elasticity of demand is an increasing function of price.. While this is not true of all demand curves

Results: The angle corresponding to the ES onset of myoelectric silence was reduced after the fatigue task, and loading the spine decreased the lumbar contribution to motion

Previous studies have demonstrated that SLE patients have decreased isometric muscle strength when compared to healthy controls [7,8]. These studies evaluated muscle strength