• No results found

ANTONIS ANASTASOPOULOS CS499 INTRODUCTION TO NLP PARSING WITH CFGS.

N/A
N/A
Protected

Academic year: 2021

Share "ANTONIS ANASTASOPOULOS CS499 INTRODUCTION TO NLP PARSING WITH CFGS."

Copied!
35
0
0

Loading.... (view fulltext now)

Full text

(1)

PARSING WITH CFG S

ANTONIS ANASTASOPOULOS CS499 INTRODUCTION TO NLP

https://cs.gmu.edu/~antonis/course/cs499-spring21/

With adapted slides by David Mortensen and Alan Black

(2)

LEVELS OF REPRESENTATION

2

discourse pragmatics

semantics syntax lexemes morphology

orthography phonology

phonetics

(text)

(speech)

(3)

STRUCTURE OF THIS LECTURE

3

1 Context-free 2 3

Grammars Parsing

Algorithms Chomsky

Normal Form 4 CKY

Algorithm

(4)

CONTEXT-FREE GRAMMARS

Using grammars:

- recognition - parsing

Parsing algorithms:

- top down - bottom up

Chomsky Normal Form (CNF)

CKY Algorithm (Cocke-Younger-Kasami)

4

(5)

PARSING VS WORD MATCHING

Consider the following sentence:

The student who was taught be Dr. A won the prize.

Who won the prize?

String matching:

“Dr. A won the prize.”

Parsing based

[[The student [who was taught be Dr. A]] won the prize.]

“the student won the prize”

5

(6)

CONTEXT-FREE GRAMMARS

Vocabulary of terminal symbols:

Set of non-terminal symbols (aka variables):

Special start symbols:

Production rules of the form , where

Σ

N S ∈ N

X → α X ∈ N

α ∈ (N ∪ Σ)*

6

(7)

TWO RELATED PROBLEMS

Input: sentence and CFG Output (recognition): tree iff

Output (parsing) one or more derivations for , under .

w = (w 1 , …w n ) 𝒢

w ∈ Language (𝒢)

w 𝒢

7

(8)

PARSING AS SEARCH

8

S

w

1

w

2

w

n

top down bottom-up

(9)

RECOGNIZERS AS SEARCH

Agenda = { state0 }

while(Agenda not empty)

s = pop a state from Agenda

if s is a success-state return s // valid parse tree else if s is not a failure-state:

generate new states from s

push new states onto Agenda return nil

9

(10)

EXAMPLE GRAMMAR AND LEXIVON

10

(11)

CHOMSKY NORMAL

FORM

(12)

CONTEXT-FREE GRAMMARS IN CHOMSKY NORMAL FORM

Vocabulary of terminal symbols:

Set of non-terminal symbols (aka variables):

Special start symbols:

Production rules of the form , where

Σ

N S ∈ N

X → α X ∈ N

α ∈ N, N ∪ Σ

12

(13)

CONVERTING CFGS TO CNF

For each rule

Rewrite as:

(Introduces a new non-terminal)

X → A B C

X → A X 2 X 2 → B C

13

(14)

14

(15)

CKY ALGORITHM

(16)

CKY ALGORITHM

For

For // width

For // left boundary // right boundary

For // midpoint

Return true if

i = [1 … n]

C[i − 1, i] = {V ∣ V → w i } l = 2 … n :

i = 0 … n − l : k = i + l

j = i + 1 … k − 1 :

C[i, k] = C[i, k] ∪ {V ∣ V → YZ, Y ∈ C[i, j], Z ∈ C[j, k]}

S ∈ C[0,n]

16

(17)

CKY: CHART

17

For

For // width

For // left boundary

// right boundary

For // midpoint

Return true if

i = [1 … n]

C[i − 1, i] = {V ∣ V → wi} l = 2 … n :

i = 0 … n − l : k = i + l

j = i + 1 … k − 1 :

C[i, k] = C[i, k] ∪ {V ∣ V → YZ, Y ∈ C[i, j], Z ∈ C[j, k]}

S ∈ C[0,n]

Book

this

flight

through

Houston

(18)

CKY: CHART

18

For

For // width

For // left boundary

// right boundary

For // midpoint

Return true if

i = [1 … n]

C[i − 1, i] = {V ∣ V → wi} l = 2 … n :

i = 0 … n − l : k = i + l

j = i + 1 … k − 1 :

C[i, k] = C[i, k] ∪ {V ∣ V → YZ, Y ∈ C[i, j], Z ∈ C[j, k]}

S ∈ C[0,n]

Noun Book

this

flight

through

Houston

(19)

CKY: CHART

19

For

For // width

For // left boundary

// right boundary

For // midpoint

Return true if

i = [1 … n]

C[i − 1, i] = {V ∣ V → wi} l = 2 … n :

i = 0 … n − l : k = i + l

j = i + 1 … k − 1 :

C[i, k] = C[i, k] ∪ {V ∣ V → YZ, Y ∈ C[i, j], Z ∈ C[j, k]}

S ∈ C[0,n]

Noun, Verb Book

this

flight

through

Houston

(20)

CKY: CHART

20

For

For // width

For // left boundary

// right boundary

For // midpoint

Return true if

i = [1 … n]

C[i − 1, i] = {V ∣ V → wi} l = 2 … n :

i = 0 … n − l : k = i + l

j = i + 1 … k − 1 :

C[i, k] = C[i, k] ∪ {V ∣ V → YZ, Y ∈ C[i, j], Z ∈ C[j, k]}

S ∈ C[0,n]

Noun, Verb

Book Det

this Noun

flight Prep

through PNoun

Houston

(21)

CKY: CHART

21

For

For // width

For // left boundary

// right boundary

For // midpoint

Return true if

i = [1 … n]

C[i − 1, i] = {V ∣ V → wi} l = 2 … n :

i = 0 … n − l : k = i + l

j = i + 1 … k − 1 :

C[i, k] = C[i, k] ∪ {V ∣ V → YZ, Y ∈ C[i, j], Z ∈ C[j, k]}

S ∈ C[0,n]

Noun, Verb

Book Det

this Noun

flight Prep

through PNoun, NP

Houston

(22)

CKY: CHART

22

For

For // width

For // left boundary

// right boundary

For // midpoint

Return true if

i = [1 … n]

C[i − 1, i] = {V ∣ V → wi} l = 2 … n :

i = 0 … n − l : k = i + l

j = i + 1 … k − 1 :

C[i, k] = C[i, k] ∪ {V ∣ V → YZ, Y ∈ C[i, j], Z ∈ C[j, k]}

S ∈ C[0,n]

Noun,

Verb ——

Book Det

this Noun

flight Prep

through PNoun, NP

Houston

(23)

CKY: CHART

23

For

For // width

For // left boundary

// right boundary

For // midpoint

Return true if

i = [1 … n]

C[i − 1, i] = {V ∣ V → wi} l = 2 … n :

i = 0 … n − l : k = i + l

j = i + 1 … k − 1 :

C[i, k] = C[i, k] ∪ {V ∣ V → YZ, Y ∈ C[i, j], Z ∈ C[j, k]}

S ∈ C[0,n]

Noun,

Verb ——

Book Det NP

this Noun

flight Prep

through PNoun, NP

Houston

(24)

CKY: CHART

24

For

For // width

For // left boundary

// right boundary

For // midpoint

Return true if

i = [1 … n]

C[i − 1, i] = {V ∣ V → wi} l = 2 … n :

i = 0 … n − l : k = i + l

j = i + 1 … k − 1 :

C[i, k] = C[i, k] ∪ {V ∣ V → YZ, Y ∈ C[i, j], Z ∈ C[j, k]}

S ∈ C[0,n]

Noun,

Verb ——

Book Det NP

this Noun ——

flight Prep

through PNoun, NP

Houston

(25)

CKY: CHART

25

For

For // width

For // left boundary

// right boundary

For // midpoint

Return true if

i = [1 … n]

C[i − 1, i] = {V ∣ V → wi} l = 2 … n :

i = 0 … n − l : k = i + l

j = i + 1 … k − 1 :

C[i, k] = C[i, k] ∪ {V ∣ V → YZ, Y ∈ C[i, j], Z ∈ C[j, k]}

S ∈ C[0,n]

Noun,

Verb ——

Book Det NP ——

this Noun ——

flight Prep

through PNoun, NP

Houston

(26)

CKY: CHART

26

For

For // width

For // left boundary

// right boundary

For // midpoint

Return true if

i = [1 … n]

C[i − 1, i] = {V ∣ V → wi} l = 2 … n :

i = 0 … n − l : k = i + l

j = i + 1 … k − 1 :

C[i, k] = C[i, k] ∪ {V ∣ V → YZ, Y ∈ C[i, j], Z ∈ C[j, k]}

S ∈ C[0,n]

Noun,

Verb ——

Book Det NP ——

this Noun ——

flight Prep PP

through PNoun, NP

Houston

(27)

CKY: CHART

27

For

For // width

For // left boundary

// right boundary

For // midpoint

Return true if

i = [1 … n]

C[i − 1, i] = {V ∣ V → wi} l = 2 … n :

i = 0 … n − l : k = i + l

j = i + 1 … k − 1 :

C[i, k] = C[i, k] ∪ {V ∣ V → YZ, Y ∈ C[i, j], Z ∈ C[j, k]}

S ∈ C[0,n]

Noun,

Verb ——

Book Det NP ——

this Noun —— ——

flight Prep PP

through PNoun, NP

Houston

(28)

CKY: CHART

28

For

For // width

For // left boundary

// right boundary

For // midpoint

Return true if

i = [1 … n]

C[i − 1, i] = {V ∣ V → wi} l = 2 … n :

i = 0 … n − l : k = i + l

j = i + 1 … k − 1 :

C[i, k] = C[i, k] ∪ {V ∣ V → YZ, Y ∈ C[i, j], Z ∈ C[j, k]}

S ∈ C[0,n]

Noun,

Verb ——

Book Det NP —— NP

this Noun —— ——

flight Prep PP

through PNoun, NP

Houston

(29)

CKY: CHART

29

For

For // width

For // left boundary

// right boundary

For // midpoint

Return true if

i = [1 … n]

C[i − 1, i] = {V ∣ V → wi} l = 2 … n :

i = 0 … n − l : k = i + l

j = i + 1 … k − 1 :

C[i, k] = C[i, k] ∪ {V ∣ V → YZ, Y ∈ C[i, j], Z ∈ C[j, k]}

S ∈ C[0,n]

Noun,

Verb —— VP

Book Det NP —— NP

this Noun —— ——

flight Prep PP

through PNoun, NP

Houston

(30)

CKY: CHART

30

For

For // width

For // left boundary

// right boundary

For // midpoint

Return true if

i = [1 … n]

C[i − 1, i] = {V ∣ V → wi} l = 2 … n :

i = 0 … n − l : k = i + l

j = i + 1 … k − 1 :

C[i, k] = C[i, k] ∪ {V ∣ V → YZ, Y ∈ C[i, j], Z ∈ C[j, k]}

S ∈ C[0,n]

Noun,

Verb —— VP,

S

Book Det NP —— NP

this Noun —— ——

flight Prep PP

through PNoun, NP

Houston

(31)

CKY: CHART

31

For

For // width

For // left boundary

// right boundary

For // midpoint

Return true if

i = [1 … n]

C[i − 1, i] = {V ∣ V → wi} l = 2 … n :

i = 0 … n − l : k = i + l

j = i + 1 … k − 1 :

C[i, k] = C[i, k] ∪ {V ∣ V → YZ, Y ∈ C[i, j], Z ∈ C[j, k]}

S ∈ C[0,n]

Noun,

Verb —— VP,

S ——

Book Det NP —— NP

this Noun —— ——

flight Prep PP

through PNoun, NP

Houston

(32)

CKY: CHART

32

For

For // width

For // left boundary

// right boundary

For // midpoint

Return true if

i = [1 … n]

C[i − 1, i] = {V ∣ V → wi} l = 2 … n :

i = 0 … n − l : k = i + l

j = i + 1 … k − 1 :

C[i, k] = C[i, k] ∪ {V ∣ V → YZ, Y ∈ C[i, j], Z ∈ C[j, k]}

S ∈ C[0,n]

Noun,

Verb —— VP,

S —— S

Book Det NP —— NP

this Noun —— ——

flight Prep PP

through PNoun, NP

Houston

(33)

CKY EQUATIONS

33

(34)

CKY COMPLEXITY

Worst case:

Best is also worst case

(Others better in average case)

𝒪(n 3 ⋅ |𝒢|)

34

(35)

NEXT CLASS

Treebanks and probabilistic CFGs

35

References

Related documents

Also, although many researchers have used various multi criteria decision making algorithms for cutting parameters optimization of GFRP/E composites, the proposed

Within groups (good readers and poor-to-dyslexic readers) differences on indi- vidual measures of restoration (words, pseudowords and speech sound segments) were investigated using

In order to apply this assessment method, it is necessary to choose the cash flow definition, project a cash flow in a selected future period, calculate a discount rate,

The contribution of this research includes: a bibliometric analysis of the papers that focus on risk management in megaprojects; a systematization and classification of the risks; tw

Diff erences in the standardization of SWEE procedures aff ect the perceived autonomy and self-determination of the teacher; diff erences in the stakes attached to success lead

Um algoritmo é apresen- tado e se utiliza um arranjo das ferramentas, iniciando com a análise das componentes principais (PCA) para selecionar as características mais relevantes

Europe: Increase for industrial machinery Asia: Automotive and aftermarket business lead the way. 100 million yen 100

However, the increasing divergence of current account deficits in emerging Europe versus emerging Asia has been mainly driven by (1) the rapidly rising presence of foreign banks