Question-AnsweringUsingExample Q&APairs
KyosukeYoshida,TaroUeda,MadokaIshioroshi,HideyukiShibuki,andTatsunoriMori
GraduateShoolofEnvironmentandInformationSienes,YokohamaNationalUniversity
79-7Tokiwadai,Hodogaya-ku,Yokohama240-8501,Japan
fkyoshida, kks, ishioroshi, shib, morigforest.eis.ynu.a.jp
Abstrat
Inthispaper,weproposeamethodwhih uti-lizes aprobabilisti language modelin non-fatoidtypequestion-answeringsystemin or-dertoimproveitsauray. Themodelis a mixtureprobabilistilanguagemodelof part-of-speehandsurfaeexpressions. We intro-duedthemodelintotwosub-proesseswhih alulatesimilarityoftextsintermsofwriting style.Therstproessolletsexample ques-tionssimilartoasubmittedquestion.The se-ondonemeasures similarity betweenan an-swer andidate and example answers paired withtheolletedexamplequestions. Experi-mentalresultsshowedthattheaurayofthe systemwasimprovedbyintroduingthe pro-posedmethod.
1 Introdution
In reent years, the amount of data available on theWebisinreasing bygrowingomputer
perfor-maneand networktraf. Therefore,tehnologies
thatgiveus aesstoneessaryinformation inthe largeamountofdataarerequired. Oneofsuh teh-nologiesisquestion-answering(QA),whihisto ex-tratananswerforaquestionwritteninnatural
lan-guagefromsoure douments. Ingeneral,QA
sys-tems are ategorized into the following two types:
fatoidandnon-fatoid(Fukumoto,2007).Wefous
on the non-fatoid type QAin this paper. Table 1 showssome typial types ofnon-fatoidquestions. Theappropriatenessoftheanswerandidates is of-tenestimatedonthebasisoffollowingtwomeasures (Hanetal.,2006).
Measure1:Relevanetothetopiofthequestion, how relevant is the andidate tothe topi of thequestion?
Measure2:Appropriatenessofwritingstyle, howwelldoestheandidatesatisfythewriting stylethatisappropriateforanswersofthelass ofthegivenquestion?
Here, by the term writing style, we refer to the styleofexpressions peuliar toa lassofquestions and their answers, asshown in Table 1. Although
these two measures depend on eah other tosome
extent,weassumethattheyareindependent inthis study.
Non-fatoidtypeQAsystemsareategorizedinto
the following two types aording to how to
han-dle Measure 2. The rst type lassies
submit-tedquestions intoseveralpredenedquestiontypes
suh as denition-type, why-type, how-type, and
so on, in order to separately handle eah type of questions by different methodologies. Han et al.
(2006) alulated the above-mentioned two
mea-sures fordenition-type questions based on proba-bilisti models built from orpora. The model for
Measure 1 isalulated from retrieved douments.
Themodel forMeasure 2isalulated froma
or-pus of denitions. However, this type of systems
hassome difulties as follows. Sine the lasses
of non-fatoid questions are not well dened, it is
difult to distinguish and dene all lasses
om-prehensively. Moreover, theauray ofaquestion
lassier affets the overall auray of
question-answering, beause mislassied questions are
in-orretlyrouted toanansweringmodule for differ-entlasses.
The seond type of systems handles submitted
questions based on a unied framework without
question lassiation. Mizuno et al. (2009)
pro-poseda methodthat isable toalulateMeasure 2
withoutlassiation ofquestions. Usingexample
Typeofquestion Examplesoftypialwritingstyle
Question Answer
Denition-type tte-nani(Whatis) towa dearu(is)
Why-type Naze(Why) tame(Beause)
How-type suru-niwadou-shitaraii(HowanIdo) suru-niwamazu (Inordertodo,)
Othertypes X-toY-nohigaiwanani X-wa-daga,Y-wa
(WhatisthedifferenebetweenXandY) (WhileXis,Yis)
of a given answer andidate is onsistent with the lassofa submittedquestion. Byusingthis lassi-er,Measure2isrealized withoutquestion lassi-ation. Soriutetal. (2006)alsoproposed asystem
without question lassiation. They introdued a
statistial translation model between questions and theorrespondinganswersinordertobridgethe lex-ial gapbetween thequestions and theanswers. A set of example Q&Apairs from FAQ sites on the Webisusedfortheestimationofthemodel.
In these methods, the length of answers should bepredetermined. Thelengthofanswersannotbe hangeddynamiallyandisneessarytobeestimate fromthelengthofthequestion.
Therefore, Morietal.(2008)proposed amethod of the seond type approah that is able to adap-tively determine the lengthofan answer andidate aordingtoasubmittedquestion. Theyuse exam-pleQ&Apairsona Q&Aommunitysiteinorder tondappropriatewritingstylestoanswersfor sub-mittedquestions. Theyutilizesimplen-grammodel asfeatures toretrieve examplequestions similar to a submitted question in terms of writing style and tond appropriate writingstyles to answer. How-ever, thesimple n-grammodelisnotappropriateto
modeldependeny among words thatappear inthe
distant positionsbeause it onlyaptures linguisti phenomenathatappearwithinthen-wordswindow. Therefore,sometimestheseletionofexample ques-tionsisnotarried outorretly. Thereexist some inorretly retrieved examplequestions thatarenot similartothewholesubmittedquestionintermsof writingstyle,whilethosen-gramshappentobevery similartothen-gramsofthequestion. Theirmethod ofsoringanswerandidateisbasedonanaive
fre-queny model of word 2-grams as feature
expres-sions. Therefore, ungrammatial sentenes, whih
oftenappearinWebdoumentsandarenotsuitable
to answer andidates, happen to have high sores
whentheyhavethefeatureexpressions. Itderease
theaurayofthesystem.
Inthispaper,weemploythemethodofMorietal. (2008)asabaseline method. Weintrodue a prob-abilisti language model tothe baseline method in ordertosolve theabove problems and improve the
methodintermsofauray.
Ourmethodhasthefollowingthreefeatureparts. Firstly,aprobabilistilanguagemodelisusedto re-trieve examples similar to an submitted question. Seondly, another probabilisti language model is
onstruted from the retrieved example answers,
whihisusedtomeasuretheappropriatenessof an-swer andidates for submitted questions. Finally,
the answer andidates are lustered into several
groups,andtheandidatesthathaveunsuitable writ-ingstylesasanswersforthesubmittedquestionare removed.
Therest ofthis paperisorganized asfollows. In Setion2, weexplaintherelatedworks. InSetion 3,weexplaintheoutlineofthebaselinemethod. In Setion4, we disuss theproblems of thebaseline method. InSetion5, wedesribe thedetailofthe
proposed method. InSetion6, weondut
exam-inations ofourQAsystem, and disuss theresults. InSetion7,weprovideouronlusion.
2 RelatedWorks
The methods whih utilize probabilisti language
Reduce the number
of example questions
by using 7-gram
Q
A
Q
A
Q
A
Q
A
Q
A
Q
Q
Q
A
Q
A
Q
A
Q
A
Q
A
Q
Q
A
Q
A
Input Question
Qin
Retrieving
1500 example
questions
Select the combination
of clusters whose
language model gives
the highest generation
probability to Qin
Selected
Combination
of Clusters
Retrieved question
examples along with
their paired answer
examples
Clustering
A
Q
A
Q
A
A
Q
A
Q&A Corpus
Figure4: Retrievingquestionexamples(alongwith an-swerexamples)similartoansubmittedquestioninterms ofwritingstyle
plequestions retrieved inthis step threetimes oftargetnumber,inourexperiment.
3. Applyalusteringalgorithmtoexample ques-tionsextrated intheabovestep 2,and obtain severallusters.
4. ObtainombinationsoflustersreatedinStep
3. Generate a probabilisti language model
fromthe example questions ineah
ombina-tionoflusters.Calulatethegeneration proba-bilityofthesubmittedquestionforeahmodel. Obtaintheombination oflusterswhose lan-guagemodelgivesthehighestprobabilitytothe submittedquestion.
The reason why we divide examples into some
lustersistoshortentheproessing time ompared toalulating forall sebsets ofexample questions. TheoutlineofthisproessingisshowninFigure4.
5.2.1 ClusteringExampleQ&APairs
AsdesribedlaterinSetion5.3,wenallyneed toobtainexampleanswerspairedwiththeexample questionsthataresimilartothesubmittedquestion.
The lustering proess desribed above is for not
only example questions but also example answers, namely,forexampleQ&Apairs. Inorderto alu-late similarity between sentenes for lustering by
taking aount of word o-ourrene in distane
positionsofa sentene, weuseword skip2-grams assentene features for lustering. A skip 2-gram isanypairofwordsintheirsentene order. Itmay havesomegapsbetweentwowords. Bothquestion
examples and answerexamples are generalized for lustering (notfor obtaining probabilisti language models) as follows: 1) the following surfae ex-pressionsareusedastheyare: thefuntionalwords (e.g.interrogativespartilesandauxiliaryverbs)and somepredeterminedontentwordsdesribedbelow, 2) the other words are replaed with their
part-of-speeh tags. The predetermined ontext words
in-ludes a) words that tend to express the fous of question(e.g.riyuu(reason),houhou(method), imi(meaning),higai(differene)),andb)verbs andadjetives that frequentlyappearinorpus. As the words expressing the fouses ofquestions, we olletnounsXthatfrequentlyappearinthe follow-ingontexts oforpus: ...X-wanan-desuka (What isX of ...), ... X-wo oshiete (Tell me Xof ...), andsoon.
There are, at least, following three hoies for
similarity alulation when we luster example
questionsandanswersintosomelusters.
Similarity1
SimilaritybetweenQ&Apairsintermsofskip 2-grams. Wetakeaount ofboththequestion partand theanswerpartofaQ&Apair simul-taneously.
Similarity2
Similarity between example questions only in termsofskip2-grams.
Similarity3
Similarity between example answers only in
termsofskip2-grams.
Inthe alulation of Similarity 1, we alulate the similarity of the question parts and that of the an-swerpartsseparately, then mix thevalues into one similarity,beause thefeatureexpressions fromthe answerpartsshouldbetreatedindependentofthose ofthequestionparts,andvieversa.Asthe luster-ingalgorithm,weemployedthek-meansmethod.
5.2.2 ObtaintheOptimalCombinationsof
Clusters
We employed a simple hill limbing method to
retrieve the optimal ombination of lusters whose language model of question parts gives the maxi-mal generation probability to the submitted ques-tion.WeuseEquation(5)toalulatethegeneration probabilityandtheombinationisgreedilysearhed throughthefollowingsteps.
1. LetthelustersetCLbethegivenlusterset, andlettheandidatesetCAbeanemptyset.
Proposedmethod (=0:7)
Proposedmethod (=0:8)
Proposedmethod (=0:9)
Baseline (=0:5)
TypeofQuestion MRR
Numberof
CorretResponse MRR
Numberof
CorretResponse MRR
Numberof
CorretResponse MRR
Numberof CorretResponse
Denition 0.458 6/10 0.483 6/10 0.500 6/10 0.425 6/10
Why 0.332 9/17 0.345 8/17 0.345 9/17 0.240 6/17
How 0.400 3/3 0.611 3/3 0.511 3/3 0.111 1/3
Other 0.527 13/20 0.543 14/20 0.502 14/20 0.412 14/20
All 0.439 31/50 0.464 31/50 0.437 32/50 0.338 27/50
ofMeasure1andMeasure2inEquation(7)and2) thesimilarity alulation methods inthelustering desribedinSetion5.2.
6.1 ExperimentalSettings
Asthequestionset,weusethelatterhalfofJapanese question set of NTCIR-6 QAC formal run test set (Fukumotoetal.,2007).
AsaWebsearhengineforinformationsoureof
QA,we adopted Yahoo! Japan API
1
. With regard toQ&Aexamples,weusedaorpus of0.9 million Q&ApairsthatomesfromYahoo! Chiebukuro, whih isa Q&A ommunity site and the Japanese
version of Yahoo! answers. Let the
parame-ter target numberdesribed inSetion 5.2 be 500.
The systems output ve answers for eah
submit-tedquestioninthedesendingorderofsore. Judg-mentwhether ananswer andidateisorret ornot isperformedby one assessor. Theassessor judged
an output answer andidate orret, when the
an-didate inludes orret answer for the question as
itspart. WeuseMeanReiproal Rank(MRR
2 )as theevaluation metris.InadditiontoMRR,wealso investigate the number of the questions for whih the system anreturn, at least, one orret answer inthetopveanswerandidates(numberoforret responses,hereafter).
6.2 ExperimentalResults
ExperimentalresultsareshownintheTable2,3,and 4.
Withregardtothebaselinemethod,weemployed 0.5fortheparameter,beauseitgivesthebest
per-formane in terms ofMRR.On the other hand, as
forthe proposed method, the results areshown for the three settings, =0.7,0.8,and 0.9, whih give thebetterperformanethanothersettings.
1
http://developer.yahoo.o.jp/ 2
ReiproalRank(RR)istheinverseoftherankoftherst orretanswerandidate.MRRistheaverageofRRsoverthe questionset.
Although the proposed method and the baseline
methoddo notperform anyquestion lassiation,
theresults areshownonatype-by-type basisin or-dertoinvestigatetheeffetivenessofthemethodfor eahtypialquestiontypedesribedinTable1.
6.3 Disussion
All of Table 2, 3, and 4 show that the proposed methodoutperformsthebaselinemethod.
With regard to the numberof orret responses,
theproposed method givesmore orret responses
thanthebaselinemethodexeptfortheaseofuse ofSimilarity1(usingbothquestionpartandanswer part)and =0:7.
WithregardtoMRR,theproposed methodgives betterperformanethanthebaselinemethodfornot onlythe average of allquestions but also the aver-age of eah type of question. One of the reasons for thegood performane may be the fat that the proposemethodanappropriatelylterout ungram-matialexpressions inanswerandidates,whilethe baselinemethodsometimesemploythemasanswer response. It means that the introdued probabilis-tilanguage modelontributetoremoving
ungram-matial text fromanswer andidates. Another one
ofthereasonsforthegoodperformanemaybethe
fatthat theproposed methodanredue the
num-berofexampleQ&Apairswhihinludeunsuitable expressions for answers of the submitted question when the system retrieves example Q&Apairs. It means that more example answers suitable to the submitted questionan be retrieved byintroduing thelusteringandtheprobabilistiestimationtothe proessofretrievingexamplequestions,andasa re-sult,byreningthelanguagemodelofanswers.The followingshowsanexampleforwhihthebaseline methodannotgiveorretanswer,buttheproposed
Question(submitted)
WhatisrequiredtoeffetuatetheKyoto Proto-ol?(OriginallyinJapanese)
Answer(Baseline)
After deposit of instrument of ratiation of
Kyoto protool by the Russian goverment, a
ondition for ratiation is satised, it is ef-fetuated onFebruary16, 2005. (Originally in Japanese)
Answer(Proposedmethod)
In order to effetuate the Kyoto Protool, the ratiationbymorethan50signatoryountries and ontrieswhosearbon-dioxide emissionis
more than 55% of advaned industrial
oun-tries'areneeded.(OriginallyinJapanese)
Withregardtothemethodsof similarity alula-tioninlustering exampleQ&Apairs,Similarity 1 (usingbothquestionpartandanswerpart)generally givesbetterperformanethanothersimilarity alu-lationmethods intermsofboththenumberof
or-retresponse andMRR.Thefollowingreason may
besupposed.
The features from question parts of retrieved Q&Aexamplesseemnottobesuitablefor lus-teringtheQ&Aexamples beause the writing stylesofquestionpartsareverysimilartoeah other on aount of the method for retrieving Q&Aexamples. Inorder to retrieve example questionssimilartothesubmittedquestion,we
use the 7-gram in eah question part whose
enterwordisaninterrogative.
Sineanswerpartshavelonger textthan ques-tion parts in Q&A examples and are onse-quently desribed invarious writing styles, it
may be possible to nd subgroups of answer
parts aording to the variations of writing
styles.
Forthesereasons,theuseofanswerpartsofQ&A examples ismore efientforlustering the exam-ples. Although thereisnosigniantdifferene be-tween Similarity 1 and Similarity 3 (using answer partonly)asshowninTable2and4,thesystemwith Similarity1 (=0.9)stably outperforms thesystem withSimilarity 3interms ofthenumberof orret response. Moreover,MRRofthesystemwith Simi-larity1,0.482,isalmostthesameasthebest perfor-mane,0.483,amongallsettings.
7 Conlusion
Inthis study, we poposed a method to introdue a probabilistilanguagemodelintonon-fatoid ques-tionanswering inorder toimprovethe auray ot thesystemproposedbyMorietal.(2008)
Weintrodued themodelintotwosub-proesses
whihalulate similarityintermsof writingstyle. Therstproessolletsexamplequestionssimilar toansubmittedquestion. Theseondonemeasures similarity between an answer andidate and exam-pleanswerspairedwiththeolletedexample ques-tions.Theexperimentalresultsshowedthatthe sys-temwiththepropose methodoutperforms the base-linesystem.
Referenes
Fukumoto,J. 2007. Question AnsweringSystem for Non-fatoidType Questions and Automati Evalua-tion basedonBE Method. Proeedingsofthe Sixth NTCIRWorkshopMeeting,Tokyo,Japan,441447. Fukumoto,J.,T.Kato,F.MasuiandT.Mori. 2007. An
Overviewof the 4th Question AnsweringChallenge (QAC-4)atNTCIRWorkshop6. Proeedingsofthe SixthNTCIRWorkshopMeeting,433440.
Han,K.-S.,Y.-I.SongandH.-C.Rim.2006. Probabilis-ti modelfor denitional question answering. Pro-eedingsofthe29thannualinternationalACMSIGIR onfereneonResearhanddevelopmentin informa-tionretrieval,SIGIR'06,NewYork,212219. Heie,M.H.,E.W.D.WhittakerandS.Furui. 2012.
Ques-tion answeringusing statistial language modelling. ComputerSpeehandLanguage26,193209. Mizuno,J.,T.Akiba,A.FujiiandK.Itou. 2009.
Non-fatoidQuestionAnsweringExperimentsat
NTCIR-6:Towards Answer Type Detetion for Realworld
Questions.ProeedingsoftheSixthNTCIRWorkshop, 487492.
Mori,T.,M.SatoandM.Ishioroshi. 2008. Answering anylass ofJapanese non-fatoidquestionby using theWebandexampleQ&ApairsfromasoialQ&A website. Proeedingsofthe2008IEEE/WIC/ACM In-ternationalConfereneonWebIntelligeneand Intel-ligentAgentTehnology,5965.
Soriut, R., T.Akiba and E. Brill. 2006. Automati QuestionAnsweringUsingtheWeb:Beyondthe Fa-toid. JournalofInformationRetrieval-SpeialIssue onWebInformationRetrieval,vol.9,191206.
Takahashi, A., A. Takatsu and J. Adahi. 2010.