• No results found

Lecture 14: Ques-on Answering

N/A
N/A
Protected

Academic year: 2021

Share "Lecture 14: Ques-on Answering"

Copied!
32
0
0

Loading.... (view fulltext now)

Full text

(1)

Lecture 14:

Ques-on Answering

Wei Xu

(2)

2

QA is very broad

Factoid QA: what states border Mississippi?, when was Barack Obama born?“Ques<on answering” as a term is so broad as to be meaninglessIs P=NP?What is the transla=on of [sentence] into French? [McCann et al., 2018] ‣ Lots of this could be handled by QA from a knowledge base, if we had a big enough knowledge base ‣ What is 4+5?

(3)

Classical Ques-on Answering

Q: “where was Barack Obama born”

‣ Form seman-c representa-on from seman-c parsing, execute against structured knowledge base

λx. type(x, Location) ∧ born_in(Barack_Obama, x)

(other representa-ons like SQL possible too…)

‣ How to deal with open-domain data/rela-ons? Need data to learn how to ground every predicate or need to be able to produce predicates in a zero-shot way

(4)

Reading Comprehension

‣ “AI challenge problem”: answer ques-on given context

Richardson (2013)

‣ MCTest (2013): 500

passages, 4 ques-ons per passage

‣ Two ques-ons per

passage explicitly require cross-sentence reasoning

‣ Recognizing Textual Entailment (2006)

(5)

Dataset Explosion

‣ 10+ QA datasets released since 2015

‣ Ques-on answering: ques-ons are in natural language

‣ “Cloze” task: word (o`en an en-ty) is removed from a sentence

‣ Answers: mul-ple choice or require picking from the passage

‣ Require human annota-on

‣ Answers: mul-ple choice, pick from passage, or pick from vocabulary

‣ Can be created automa-cally from things that aren’t ques-ons

‣ Children’s Book Test, CNN/Daily Mail, SQuAD, TriviaQA are most well-known (others: SearchQA, MS Marco, RACE, WikiHop, …)

(6)

Children’s Book Test

Hill et al. (2015)

‣ Children’s Book Test: take a sec-on of a children’s story, block out an en-ty and predict it (one-doc mul--sentence cloze task)

(7)

bAbI

‣ Evalua-on on 20 tasks proposed as building blocks for building “AI-complete” systems

‣ Various levels of difficulty, exhibit different linguis-c phenomena

Weston et al. (2014)

(8)

Dataset Proper-es

‣ O`en shallow methods work well because most answers are in a single sentence (SQuAD, MCTest)

‣ Some explicitly require linking between mul-ple sentences (MCTest)

‣ Axis 1: QA vs. cloze (Children’s Book Test)

‣ Axis 2: single-sentence vs. passage

‣ Axis 3: single-document (datasets in this lecture) vs. mul--document (TriviaQA, WikiHop, HotPotQA, …)

(9)
(10)

Memory Networks

‣ Memory networks let you reference input with agen-on

‣ Encode input items into two vectors: a key and a value

‣ Keys compute agen-on weights given a query, weighted sum of values gives the output

(11)

Memory Networks

‣ Three layers of memory network where the query representa-on is updated addi-vely based on the memories at each step

Sukhbaatar et al. (2015)

‣ How to encode the sentences?

‣ Bag of words (average embeddings)

‣ Posi-onal encoding: mul-ply each word by a vector capturing posi-on in sentence

(12)

Evalua-on: bAbI

‣ 3-hop memory network does pregy well, beger than LSTM at processing these types of examples

(13)

Evalua-on: Children’s Book Test

‣ Outperforms LSTMs substan-ally with

(14)

Memory Network Takeaways

‣ Useful for cloze tasks where far-back context is necessary

‣ What can we do with more basic agen-on?

‣ Memory networks provide a way of agending to abstrac-ons over the input

(15)
(16)

CNN/Daily Mail

Hermann et al. (2015), Chen et al. (2016)

‣ Single-document, (usually) single-sentence cloze task

‣ Formed based on ar-cle

summaries — informa-on should mostly be present, makes it

easier than Children’s Book Test

‣ Need to process the ques-on, can’t just use LSTM LMs

(17)

CNN/Daily Mail

Hermann et al. (2015), Chen et al. (2016)

X visited England ||| Mary visited England

‣ LSTM reader: encode ques-on, encode passage, predict en-ty

‣ Can also use textual entailment-like models

X visited England

Mary visited England

Mary

Mul-class classifica-on problem over en--es

in the document

(18)

CNN/Daily Mail

Hermann et al. (2015) ‣ Agen-ve reader: u = encode query s = encode sentence r = agen-on(u -> s) predic-on = f(candidate, u, r) ‣ Uses fixed-size

representa-ons for the

final predic-on, mul-class classifica-on

(19)

CNN/Daily Mail

‣ Chen et al (2016): small changes to the agen-ve reader

Stanford Agen-ve Reader 76.2 76.5 79.5 78.7

‣ Addi-onal analysis of the task found that many of the remaining ques-ons were unanswerable or

extremely difficult

(20)
(21)

SQuAD

‣ Single-document, single-sentence ques-on-answering task where the answer is always a substring of the passage

Rajpurkar et al. (2016)

(22)

SQuAD

What was Marie Curie the first female recipient of?

Rajpurkar et al. (2016)

first female recipient of the Nobel Prize . START END

‣ Like a tagging problem over the sentence (not mul-class classifica-on), but we need some way of agending to the query

(23)

Bidirec-onal Agen-on Flow

‣ Passage (context) and query are both encoded with BiLSTMs

‣ Context-to-query agen-on: compute so`max over columns of S, take weighted sum of u based on agen-on weights for each passage word

Seo et al. (2016)

passage H

query U

S

ij

= h

i

· u

j

ij = softmaxj (Sij ) ‣ dist over query

˜

ui = X

j

ij uj ‣ query “specialized”

(24)

Bidirec-onal Agen-on Flow

Seo et al. (2016)

Each passage

word now “knows about” the query

(25)

25

QA with BERT

Devlin et al. (2019) What was Marie Curie the first female recipient of ? [SEP] One of the most famous people born in Warsaw was Marie … ‣ Predict start and end posi<ons in passage ‣ No need for cross-a8en<on mechanisms!

(26)

SQuAD SOTA: 2018

‣ nlnet, QANet, r-net — dueling super complex

systems (much more than BiDAF…)

‣ BERT: transformer-based approach with pretraining on 3B tokens

(27)

SQuAD 2.0 SOTA: Spring 2019

27

SQuAD SOTA: Spring 19

‣ Industry contest ‣ SQuAD 2.0: harder dataset because some ques<ons are unanswerable

(28)

SQuAD 2.0 SOTA: Fall 2019

28

SQuAD SOTA: Today

‣ Harder QA sezngs are needed! ‣ Performance is very saturated

(29)

SQuAD 2.0 SOTA: Today

29

SQuAD SOTA: Today

‣ Harder QA sezngs are needed! ‣ Performance is very saturated

(30)

30

TriviaQA

Joshi et al. (2017) ‣ Totally figuring this out is very challenging ‣ Coref:
 the failed campaign
 movie of the same name ‣ Lots of surface clues: 1961, campaign, etc. ‣ Systems can do well without really understanding the text

(31)

31

What are these models learning?

“Who…”: knows to look for people“Which film…”: can iden<fy movies and then spot keywords that are related to the ques<on ‣ Unless ques<ons are made super tricky (target closely-related en<<es who are easily confused), they’re usually not so hard to answer

(32)

Takeaways

‣ Memory networks let you reference input in an agen-on-like way, useful for generalizing language models to long-range reasoning

‣ Complex agen-on schemes can match queries against input texts and iden-fy answers

‣ Many flavors of reading comprehension tasks: cloze or actual ques-ons, single or mul--sentence

References

Related documents

The inclusion of the word has shocked some school librarians, who have pledged to ban the book from elementary schools, and reopened the debate over what constitutes

Members of Anti-Terrorism Intelligence Section shall not maintain or utilize the Division’s intelligence materials outside of their official work location without the written

Unique datum is most widely report personality test attempts to identify the dark triad personality traits are sometimes not a specific trait being considered extraverted and

How to Prevent Cancer lifestyle yaşam tarzı adjustment ayarlama,düzenleme dietary öğünsel prevent from önlemek ,alıkoymak fatal ölümcül account for oluşturmak,açıklamak

FY, financial year; IC, institutes and centers; NCI, National Cancer Institute; NHGRI, National Human Genome Research Institute; NHLBI, National Heart Lung and Blood Institute;

The strategy of increasing flexibility and responsiveness is, on the contrary, positively associated with the introduction of process and any innovations and

Based on the above results, the SWAT calibrated for the Pomba River sub-basin with a control section in Astolfo Dutra can be applied to the hydrological simulation of the Novo

This chapter continues with background information about Home Retail Group, including their involvement with electric vehicles, their management and sustainability control systems,