ANTONIS ANASTASOPOULOS CS499 INTRODUCTION TO NLP PARSING WITH CFGS.

35  Download (0)

Full text

(1)

PARSING WITH CFG S

ANTONIS ANASTASOPOULOS CS499 INTRODUCTION TO NLP

https://cs.gmu.edu/~antonis/course/cs499-spring21/

With adapted slides by David Mortensen and Alan Black

(2)

LEVELS OF REPRESENTATION

2

discourse pragmatics

semantics syntax lexemes morphology

orthography phonology

phonetics

(text)

(speech)

(3)

STRUCTURE OF THIS LECTURE

3

1 Context-free 2 3

Grammars Parsing

Algorithms Chomsky

Normal Form 4 CKY

Algorithm

(4)

CONTEXT-FREE GRAMMARS

Using grammars:

- recognition - parsing

Parsing algorithms:

- top down - bottom up

Chomsky Normal Form (CNF)

CKY Algorithm (Cocke-Younger-Kasami)

4

(5)

PARSING VS WORD MATCHING

Consider the following sentence:

The student who was taught be Dr. A won the prize.

Who won the prize?

String matching:

“Dr. A won the prize.”

Parsing based

[[The student [who was taught be Dr. A]] won the prize.]

“the student won the prize”

5

(6)

CONTEXT-FREE GRAMMARS

Vocabulary of terminal symbols:

Set of non-terminal symbols (aka variables):

Special start symbols:

Production rules of the form , where

Σ

N S ∈ N

X → α X ∈ N

α ∈ (N ∪ Σ)*

6

(7)

TWO RELATED PROBLEMS

Input: sentence and CFG Output (recognition): tree iff

Output (parsing) one or more derivations for , under .

w = (w 1 , …w n ) 𝒢

w ∈ Language (𝒢)

w 𝒢

7

(8)

PARSING AS SEARCH

8

S

w

1

w

2

w

n

top down bottom-up

(9)

RECOGNIZERS AS SEARCH

Agenda = { state0 }

while(Agenda not empty)

s = pop a state from Agenda

if s is a success-state return s // valid parse tree else if s is not a failure-state:

generate new states from s

push new states onto Agenda return nil

9

(10)

EXAMPLE GRAMMAR AND LEXIVON

10

(11)

CHOMSKY NORMAL

FORM

(12)

CONTEXT-FREE GRAMMARS IN CHOMSKY NORMAL FORM

Vocabulary of terminal symbols:

Set of non-terminal symbols (aka variables):

Special start symbols:

Production rules of the form , where

Σ

N S ∈ N

X → α X ∈ N

α ∈ N, N ∪ Σ

12

(13)

CONVERTING CFGS TO CNF

For each rule

Rewrite as:

(Introduces a new non-terminal)

X → A B C

X → A X 2 X 2 → B C

13

(14)

14

(15)

CKY ALGORITHM

(16)

CKY ALGORITHM

For

For // width

For // left boundary // right boundary

For // midpoint

Return true if

i = [1 … n]

C[i − 1, i] = {V ∣ V → w i } l = 2 … n :

i = 0 … n − l : k = i + l

j = i + 1 … k − 1 :

C[i, k] = C[i, k] ∪ {V ∣ V → YZ, Y ∈ C[i, j], Z ∈ C[j, k]}

S ∈ C[0,n]

16

(17)

CKY: CHART

17

For

For // width

For // left boundary

// right boundary

For // midpoint

Return true if

i = [1 … n]

C[i − 1, i] = {V ∣ V → wi} l = 2 … n :

i = 0 … n − l : k = i + l

j = i + 1 … k − 1 :

C[i, k] = C[i, k] ∪ {V ∣ V → YZ, Y ∈ C[i, j], Z ∈ C[j, k]}

S ∈ C[0,n]

Book

this

flight

through

Houston

(18)

CKY: CHART

18

For

For // width

For // left boundary

// right boundary

For // midpoint

Return true if

i = [1 … n]

C[i − 1, i] = {V ∣ V → wi} l = 2 … n :

i = 0 … n − l : k = i + l

j = i + 1 … k − 1 :

C[i, k] = C[i, k] ∪ {V ∣ V → YZ, Y ∈ C[i, j], Z ∈ C[j, k]}

S ∈ C[0,n]

Noun Book

this

flight

through

Houston

(19)

CKY: CHART

19

For

For // width

For // left boundary

// right boundary

For // midpoint

Return true if

i = [1 … n]

C[i − 1, i] = {V ∣ V → wi} l = 2 … n :

i = 0 … n − l : k = i + l

j = i + 1 … k − 1 :

C[i, k] = C[i, k] ∪ {V ∣ V → YZ, Y ∈ C[i, j], Z ∈ C[j, k]}

S ∈ C[0,n]

Noun, Verb Book

this

flight

through

Houston

(20)

CKY: CHART

20

For

For // width

For // left boundary

// right boundary

For // midpoint

Return true if

i = [1 … n]

C[i − 1, i] = {V ∣ V → wi} l = 2 … n :

i = 0 … n − l : k = i + l

j = i + 1 … k − 1 :

C[i, k] = C[i, k] ∪ {V ∣ V → YZ, Y ∈ C[i, j], Z ∈ C[j, k]}

S ∈ C[0,n]

Noun, Verb

Book Det

this Noun

flight Prep

through PNoun

Houston

(21)

CKY: CHART

21

For

For // width

For // left boundary

// right boundary

For // midpoint

Return true if

i = [1 … n]

C[i − 1, i] = {V ∣ V → wi} l = 2 … n :

i = 0 … n − l : k = i + l

j = i + 1 … k − 1 :

C[i, k] = C[i, k] ∪ {V ∣ V → YZ, Y ∈ C[i, j], Z ∈ C[j, k]}

S ∈ C[0,n]

Noun, Verb

Book Det

this Noun

flight Prep

through PNoun, NP

Houston

(22)

CKY: CHART

22

For

For // width

For // left boundary

// right boundary

For // midpoint

Return true if

i = [1 … n]

C[i − 1, i] = {V ∣ V → wi} l = 2 … n :

i = 0 … n − l : k = i + l

j = i + 1 … k − 1 :

C[i, k] = C[i, k] ∪ {V ∣ V → YZ, Y ∈ C[i, j], Z ∈ C[j, k]}

S ∈ C[0,n]

Noun,

Verb ——

Book Det

this Noun

flight Prep

through PNoun, NP

Houston

(23)

CKY: CHART

23

For

For // width

For // left boundary

// right boundary

For // midpoint

Return true if

i = [1 … n]

C[i − 1, i] = {V ∣ V → wi} l = 2 … n :

i = 0 … n − l : k = i + l

j = i + 1 … k − 1 :

C[i, k] = C[i, k] ∪ {V ∣ V → YZ, Y ∈ C[i, j], Z ∈ C[j, k]}

S ∈ C[0,n]

Noun,

Verb ——

Book Det NP

this Noun

flight Prep

through PNoun, NP

Houston

(24)

CKY: CHART

24

For

For // width

For // left boundary

// right boundary

For // midpoint

Return true if

i = [1 … n]

C[i − 1, i] = {V ∣ V → wi} l = 2 … n :

i = 0 … n − l : k = i + l

j = i + 1 … k − 1 :

C[i, k] = C[i, k] ∪ {V ∣ V → YZ, Y ∈ C[i, j], Z ∈ C[j, k]}

S ∈ C[0,n]

Noun,

Verb ——

Book Det NP

this Noun ——

flight Prep

through PNoun, NP

Houston

(25)

CKY: CHART

25

For

For // width

For // left boundary

// right boundary

For // midpoint

Return true if

i = [1 … n]

C[i − 1, i] = {V ∣ V → wi} l = 2 … n :

i = 0 … n − l : k = i + l

j = i + 1 … k − 1 :

C[i, k] = C[i, k] ∪ {V ∣ V → YZ, Y ∈ C[i, j], Z ∈ C[j, k]}

S ∈ C[0,n]

Noun,

Verb ——

Book Det NP ——

this Noun ——

flight Prep

through PNoun, NP

Houston

(26)

CKY: CHART

26

For

For // width

For // left boundary

// right boundary

For // midpoint

Return true if

i = [1 … n]

C[i − 1, i] = {V ∣ V → wi} l = 2 … n :

i = 0 … n − l : k = i + l

j = i + 1 … k − 1 :

C[i, k] = C[i, k] ∪ {V ∣ V → YZ, Y ∈ C[i, j], Z ∈ C[j, k]}

S ∈ C[0,n]

Noun,

Verb ——

Book Det NP ——

this Noun ——

flight Prep PP

through PNoun, NP

Houston

(27)

CKY: CHART

27

For

For // width

For // left boundary

// right boundary

For // midpoint

Return true if

i = [1 … n]

C[i − 1, i] = {V ∣ V → wi} l = 2 … n :

i = 0 … n − l : k = i + l

j = i + 1 … k − 1 :

C[i, k] = C[i, k] ∪ {V ∣ V → YZ, Y ∈ C[i, j], Z ∈ C[j, k]}

S ∈ C[0,n]

Noun,

Verb ——

Book Det NP ——

this Noun —— ——

flight Prep PP

through PNoun, NP

Houston

(28)

CKY: CHART

28

For

For // width

For // left boundary

// right boundary

For // midpoint

Return true if

i = [1 … n]

C[i − 1, i] = {V ∣ V → wi} l = 2 … n :

i = 0 … n − l : k = i + l

j = i + 1 … k − 1 :

C[i, k] = C[i, k] ∪ {V ∣ V → YZ, Y ∈ C[i, j], Z ∈ C[j, k]}

S ∈ C[0,n]

Noun,

Verb ——

Book Det NP —— NP

this Noun —— ——

flight Prep PP

through PNoun, NP

Houston

(29)

CKY: CHART

29

For

For // width

For // left boundary

// right boundary

For // midpoint

Return true if

i = [1 … n]

C[i − 1, i] = {V ∣ V → wi} l = 2 … n :

i = 0 … n − l : k = i + l

j = i + 1 … k − 1 :

C[i, k] = C[i, k] ∪ {V ∣ V → YZ, Y ∈ C[i, j], Z ∈ C[j, k]}

S ∈ C[0,n]

Noun,

Verb —— VP

Book Det NP —— NP

this Noun —— ——

flight Prep PP

through PNoun, NP

Houston

(30)

CKY: CHART

30

For

For // width

For // left boundary

// right boundary

For // midpoint

Return true if

i = [1 … n]

C[i − 1, i] = {V ∣ V → wi} l = 2 … n :

i = 0 … n − l : k = i + l

j = i + 1 … k − 1 :

C[i, k] = C[i, k] ∪ {V ∣ V → YZ, Y ∈ C[i, j], Z ∈ C[j, k]}

S ∈ C[0,n]

Noun,

Verb —— VP,

S

Book Det NP —— NP

this Noun —— ——

flight Prep PP

through PNoun, NP

Houston

(31)

CKY: CHART

31

For

For // width

For // left boundary

// right boundary

For // midpoint

Return true if

i = [1 … n]

C[i − 1, i] = {V ∣ V → wi} l = 2 … n :

i = 0 … n − l : k = i + l

j = i + 1 … k − 1 :

C[i, k] = C[i, k] ∪ {V ∣ V → YZ, Y ∈ C[i, j], Z ∈ C[j, k]}

S ∈ C[0,n]

Noun,

Verb —— VP,

S ——

Book Det NP —— NP

this Noun —— ——

flight Prep PP

through PNoun, NP

Houston

(32)

CKY: CHART

32

For

For // width

For // left boundary

// right boundary

For // midpoint

Return true if

i = [1 … n]

C[i − 1, i] = {V ∣ V → wi} l = 2 … n :

i = 0 … n − l : k = i + l

j = i + 1 … k − 1 :

C[i, k] = C[i, k] ∪ {V ∣ V → YZ, Y ∈ C[i, j], Z ∈ C[j, k]}

S ∈ C[0,n]

Noun,

Verb —— VP,

S —— S

Book Det NP —— NP

this Noun —— ——

flight Prep PP

through PNoun, NP

Houston

(33)

CKY EQUATIONS

33

(34)

CKY COMPLEXITY

Worst case:

Best is also worst case

(Others better in average case)

𝒪(n 3 ⋅ |𝒢|)

34

(35)

NEXT CLASS

Treebanks and probabilistic CFGs

35

Figure

Updating...

References

Related subjects :