• No results found

Chapter 2 Background

3.1 Question Processing Module

3.1.4 Surface Pattern Matching

Considering that there are questions with very high frequency to be asked in TREC, we build question patterns to map high frequent questions to classes and extract answers for the questions using answer patterns. In this section, I will briefly discuss question pattern matching and answer pattern matching respectively.

Different from (Kaisser and Becker, 2004) and (Wu et al., 2005), question classes are defined in terms of semantic meaning but not syntactic structures. A question pattern consists of three elements:

2The Wikipedia articles are available to download from http://en.wikipedia.org/wiki/

Q: In which city is the IMF headquartered ? IMF International Monetary Fund

headquartered v. headquarter, n. headquarte rs

Sent: Visiting the International Monetary Fund 's Washington headquarters

for the first time, Arafat said the IMF already had provided advice that

enabled the Palestinian Authority to overc ome many hurdles in establishing institutions and laws.

Figure 3.3: An example of question word expansion

• BFORM is the basic chunk sequence form of the question pattern;

• CONS contains a set of constraints which question pattern matching is required to subject to;

• SLOTS is session which connects question pattern matching to answer pattern matching. It contains a set of slot assignment. Each slot assignment records which question key chunks will be used to fill in the corresponding slots of answer pat- terns.

Figure 3.4 shows a set of question patterns for the question class ”SYNONYM”. Given a question, it is passed to all question classes one by one. For each class, the question is compared with each question pattern (QPTN) in the class. Firstly, the question chunk sequence is matched to the basic form (BFORM) of the pattern. If it matches, we further check whether the constraints (CONS) of the pattern are satisfied by the key chunks of the question. Once all constraints are satisfied, the question is matched to the pattern and classified into the corresponding class. Furthermore, the slot session (SLOTS) of the pattern records which key chunks of the question will be used to fill in the corresponding slots of answer patterns. For example, given a question ”What does AARP stand for?”,

the question chunk sequence ”what do np0 vb0?” is matched to the basic form ”what do

np0 vb0” of the pattern in Figure 3.4. Moreover, the question satisfies the constraint that

the key chunk ”vb0” belongs to one of the phrases ”mean|(translate to)|(refer to)|(stand for)” defined in the variable session (VARS). As a result, we match the question to the question pattern and classify it into the ”SYNONYM” class.

In the current version, 31 question classes, as shown in Figure 3.5 are considered in the system. We manually build a set fo question patterns for each class. In TREC 2006, 92 among total 403 factoid questions are matched to one of the question classes.

Besides question patterns, we also build answer patterns (APTN) to extract an- swers for the questions which belong to one of the predefined question classes. An answer pattern indicates expected answer position in surface sentences. Answer patterns are represented as regular expressions over tokens, containing three variables:

• slot is bound to the key chunks of questions. A question chunk, expected by certain slots, is assigned in the slot session of the corresponding question patterns. For example, in the second question pattern of Figure 3.4, ”slot0” expects the question key chunk ”np1” while in the third question pattern, ”slot0” expects the question key chunk ”np0”.

• var is a set of special alternative words, which are usually shared by various pat- terns and also used in the question patterns. For instance, in Figure 3.4, ”var0” is set the value as ”name|nickname|alternate|abbreviation|acronym|expansion”. • ANSWER indicates expected answer.

Figure 3.6 shows a set of answer patterns for the question class ”SYNONYM”. Pat- terns are manually authored for the system. However, TREC 2006 results show that the coverage is not satisfactory since only 12 questions can be correctly answered by the surface pattern matching. The results motivate us to explore deeper linguistic analysis and incorporate more external resources for Answer Extraction.

<CLASS key="SYNONYM"> <QPTN_SET> <QPTN> <BFORM>(who|what) be np0 \?</BFORM> <CONS> </CONS> <SLOTS> <SLOT key="slot0">np0</SLOT > </SLOTS> </QPTN> <QPTN>

<BFORM>what be np0 (of|for) np1 \?< /BFORM> <CONS> <CON key="np0">var0</CON> </CONS> <SLOTS> <SLOT key="slot0">np1</SLOT > </SLOTS> </QPTN> <QPTN> <BFORM>what be np0 's np1 \?</BFORM > <CONS> <CON key="np1">var0</CON> </CONS> <SLOTS> <SLOT key="slot0">np0</SLOT > </SLOTS> </QPTN> <QPTN> <BFORM>what do np0 vb0 \?</BFORM> <CONS> <CON key="vb0">var1</CON> </CONS> <SLOTS> <SLOT key="slot0">np0</SLOT > </SLOTS> </QPTN> <QPTN> <BFORM>what be np0 vb0</BFORM> <CONS> <CON key="vb0">var2</CON> </CONS> <SLOTS> <SLOT key="slot0">np0</SLOT > </SLOTS> </QPTN> <QPTN_SET> <VARS>

<VAR key="var0">name|nickname|alternate|a bbreviation|acronym|expansion</VAR> <VAR key="var1">mean|(translate to)|(refe r to)|(stand for)</VAR>

<VAR key="var2">call|name</VAR> </VARS>

</CLASS>

Question Class Example WHO_CREATE Who discovered prions?

WHAT_CREATE What did Edward Binney and Howard Smith invent in 1 903? WHEN_CREATE When was the International Criminal Court establish ed? WHERE_CREATE When was the Black Panthers organization founded?

WHAT_BE_ORG_OF_PERSON What record company is Fred Durst with? WHO_BE_PRESIDENT_OF_ORG Who is AARP's top official or CEO? WHO_BE_MEMBER_OF_ORG Who are the members of Insane Clown Posse? WHEN_BORN When was James Dean born? WHERE_BORN Where was James Dean born? WHEN_DIE When did Franz Kafka die? WHERE_DIE Where did Franz Kafka die? HOW_DIE What did James Dean die of?

HOW_OLD_DIE How old was Jean Harlow when Jean Harlow died? WHERE_BURY Where is Jean Harlow buried?

NATIONALITY What is minstrel Al Jolson's nationality? OCCUPATION What was Gordon Gekko's profession? WHO_MARRY Who is Tom Cruise married to? WHO_FATHER Who was Horus father? WHO_MOTHER Who was Horus mother? WHERE_LIVE Where does Jennifer Capriati live? WHERE_ORG_LOCATE Where is AARP's headquarters? SYNONYM What does AARP stand for? PRODUCT What kind of business is Abercrombie & Fitch?

WHO_BE_IN_EVENT Who was the on-board commander of the submarine Kur sk? WHEN_EVENT_HAPPEN When was the first Crip gang started?

WHEN_EVENT What year was Alaska purchased? WHERE_EVENT_HAPPEN In what country did the game of croquet originate? WHERE_EVENT Where was the Miss Universe 2000 contest held? PRIZE What prizes or awards has Frank Gehry won? HOW_MANY_MEMBER How many seats are in the cabin of a Concorde? SPECIFICATION What color are UPS trucks?

<CLASS key="SYNONYM"> <APTN_SET>

<APTN>slot0( ,)?( who| which)? be born ANSW ER , </APTN> <APTN>slot0( ,)?( who| which)? be( \\w+)? ( call|know as) ANSWER , </APTN> <APTN>slot0 (, whose|'s)( \\w+| ,| \(| \)){ 0,5} (var0) be ANSWER</APTN> <APTN>slot0 (be|,) (var0) (of|for) ANSWER</ APTN> <APTN>slot0 \(( born)? ANSWER \)</APTN>

<APTN>slot0 \[( born)? ANSWER \]</APTN>

<APTN>slot0 ,( \\w+| ,| \(| \)){0,5} know as ANSWER</APTN>

<APTN>change name from ANSWER to( \\w+| ,| \(| \)){0,5} slot0</APTN> <APTN>ANSWER( ,)?( who| which)? be born slo t0 , </APTN> <APTN>ANSWER( ,)?( who| which)? be( \\w+)? (call|know as) slot0 , </APTN> <APTN>ANSWER (, whose|'s)( \\w+| ,| \(| \)) {0,5} (var0) be slot0</APTN> <APTN>ANSWER (be|,) (var0) (of|for) slot0</ APTN>

<APTN>ANSWER \(( born)? slot0 \)</APTN> <APTN>ANSWER \[( born)? slot0 \]</APTN>

<APTN>ANSWER ,( \\w+| ,| \(| \)){0,5} kno w as slot0</APTN>

<APTN>change name from slot0 to( \\w+| ,| \ (| \)){0,5} ANSWER</APTN> </APTN_SET>

</CLASS>

Related documents