PARSING WITH CFG S
ANTONIS ANASTASOPOULOS CS499 INTRODUCTION TO NLP
https://cs.gmu.edu/~antonis/course/cs499-spring21/
With adapted slides by David Mortensen and Alan Black
LEVELS OF REPRESENTATION
2
discourse pragmatics
semantics syntax lexemes morphology
orthography phonology
phonetics
(text)
(speech)
STRUCTURE OF THIS LECTURE
3
1 Context-free 2 3
Grammars Parsing
Algorithms Chomsky
Normal Form 4 CKY
Algorithm
CONTEXT-FREE GRAMMARS
Using grammars:
- recognition - parsing
Parsing algorithms:
- top down - bottom up
Chomsky Normal Form (CNF)
CKY Algorithm (Cocke-Younger-Kasami)
4
PARSING VS WORD MATCHING
Consider the following sentence:
The student who was taught be Dr. A won the prize.
Who won the prize?
String matching:
“Dr. A won the prize.”
Parsing based
[[The student [who was taught be Dr. A]] won the prize.]
“the student won the prize”
5
CONTEXT-FREE GRAMMARS
Vocabulary of terminal symbols:
Set of non-terminal symbols (aka variables):
Special start symbols:
Production rules of the form , where
Σ
N S ∈ N
X → α X ∈ N
α ∈ (N ∪ Σ)*
6
TWO RELATED PROBLEMS
Input: sentence and CFG Output (recognition): tree iff
Output (parsing) one or more derivations for , under .
w = (w 1 , …w n ) 𝒢
w ∈ Language (𝒢)
w 𝒢
7
PARSING AS SEARCH
8
S
w
1w
2… w
ntop down bottom-up
RECOGNIZERS AS SEARCH
Agenda = { state0 }
while(Agenda not empty)
s = pop a state from Agenda
if s is a success-state return s // valid parse tree else if s is not a failure-state:
generate new states from s
push new states onto Agenda return nil
9
EXAMPLE GRAMMAR AND LEXIVON
10
CHOMSKY NORMAL
FORM
CONTEXT-FREE GRAMMARS IN CHOMSKY NORMAL FORM
Vocabulary of terminal symbols:
Set of non-terminal symbols (aka variables):
Special start symbols:
Production rules of the form , where
Σ
N S ∈ N
X → α X ∈ N
α ∈ N, N ∪ Σ
12
CONVERTING CFGS TO CNF
For each rule
Rewrite as:
(Introduces a new non-terminal)
X → A B C
X → A X 2 X 2 → B C
13
14
CKY ALGORITHM
CKY ALGORITHM
For
For // width
For // left boundary // right boundary
For // midpoint
Return true if
i = [1 … n]
C[i − 1, i] = {V ∣ V → w i } l = 2 … n :
i = 0 … n − l : k = i + l
j = i + 1 … k − 1 :
C[i, k] = C[i, k] ∪ {V ∣ V → YZ, Y ∈ C[i, j], Z ∈ C[j, k]}
S ∈ C[0,n]
16
CKY: CHART
17
For
For // width
For // left boundary
// right boundary
For // midpoint
Return true if
i = [1 … n]
C[i − 1, i] = {V ∣ V → wi} l = 2 … n :
i = 0 … n − l : k = i + l
j = i + 1 … k − 1 :
C[i, k] = C[i, k] ∪ {V ∣ V → YZ, Y ∈ C[i, j], Z ∈ C[j, k]}
S ∈ C[0,n]
Book
this
flight
through
Houston
CKY: CHART
18
For
For // width
For // left boundary
// right boundary
For // midpoint
Return true if
i = [1 … n]
C[i − 1, i] = {V ∣ V → wi} l = 2 … n :
i = 0 … n − l : k = i + l
j = i + 1 … k − 1 :
C[i, k] = C[i, k] ∪ {V ∣ V → YZ, Y ∈ C[i, j], Z ∈ C[j, k]}
S ∈ C[0,n]
Noun Book
this
flight
through
Houston
CKY: CHART
19
For
For // width
For // left boundary
// right boundary
For // midpoint
Return true if
i = [1 … n]
C[i − 1, i] = {V ∣ V → wi} l = 2 … n :
i = 0 … n − l : k = i + l
j = i + 1 … k − 1 :
C[i, k] = C[i, k] ∪ {V ∣ V → YZ, Y ∈ C[i, j], Z ∈ C[j, k]}
S ∈ C[0,n]
Noun, Verb Book
this
flight
through
Houston
CKY: CHART
20
For
For // width
For // left boundary
// right boundary
For // midpoint
Return true if
i = [1 … n]
C[i − 1, i] = {V ∣ V → wi} l = 2 … n :
i = 0 … n − l : k = i + l
j = i + 1 … k − 1 :
C[i, k] = C[i, k] ∪ {V ∣ V → YZ, Y ∈ C[i, j], Z ∈ C[j, k]}
S ∈ C[0,n]
Noun, Verb
Book Det
this Noun
flight Prep
through PNoun
Houston
CKY: CHART
21
For
For // width
For // left boundary
// right boundary
For // midpoint
Return true if
i = [1 … n]
C[i − 1, i] = {V ∣ V → wi} l = 2 … n :
i = 0 … n − l : k = i + l
j = i + 1 … k − 1 :
C[i, k] = C[i, k] ∪ {V ∣ V → YZ, Y ∈ C[i, j], Z ∈ C[j, k]}
S ∈ C[0,n]
Noun, Verb
Book Det
this Noun
flight Prep
through PNoun, NP
Houston
CKY: CHART
22
For
For // width
For // left boundary
// right boundary
For // midpoint
Return true if
i = [1 … n]
C[i − 1, i] = {V ∣ V → wi} l = 2 … n :
i = 0 … n − l : k = i + l
j = i + 1 … k − 1 :
C[i, k] = C[i, k] ∪ {V ∣ V → YZ, Y ∈ C[i, j], Z ∈ C[j, k]}
S ∈ C[0,n]
Noun,
Verb ——
Book Det
this Noun
flight Prep
through PNoun, NP
Houston
CKY: CHART
23
For
For // width
For // left boundary
// right boundary
For // midpoint
Return true if
i = [1 … n]
C[i − 1, i] = {V ∣ V → wi} l = 2 … n :
i = 0 … n − l : k = i + l
j = i + 1 … k − 1 :
C[i, k] = C[i, k] ∪ {V ∣ V → YZ, Y ∈ C[i, j], Z ∈ C[j, k]}
S ∈ C[0,n]
Noun,
Verb ——
Book Det NP
this Noun
flight Prep
through PNoun, NP
Houston
CKY: CHART
24
For
For // width
For // left boundary
// right boundary
For // midpoint
Return true if
i = [1 … n]
C[i − 1, i] = {V ∣ V → wi} l = 2 … n :
i = 0 … n − l : k = i + l
j = i + 1 … k − 1 :
C[i, k] = C[i, k] ∪ {V ∣ V → YZ, Y ∈ C[i, j], Z ∈ C[j, k]}
S ∈ C[0,n]
Noun,
Verb ——
Book Det NP
this Noun ——
flight Prep
through PNoun, NP
Houston
CKY: CHART
25
For
For // width
For // left boundary
// right boundary
For // midpoint
Return true if
i = [1 … n]
C[i − 1, i] = {V ∣ V → wi} l = 2 … n :
i = 0 … n − l : k = i + l
j = i + 1 … k − 1 :
C[i, k] = C[i, k] ∪ {V ∣ V → YZ, Y ∈ C[i, j], Z ∈ C[j, k]}
S ∈ C[0,n]
Noun,
Verb ——
Book Det NP ——
this Noun ——
flight Prep
through PNoun, NP
Houston
CKY: CHART
26
For
For // width
For // left boundary
// right boundary
For // midpoint
Return true if
i = [1 … n]
C[i − 1, i] = {V ∣ V → wi} l = 2 … n :
i = 0 … n − l : k = i + l
j = i + 1 … k − 1 :
C[i, k] = C[i, k] ∪ {V ∣ V → YZ, Y ∈ C[i, j], Z ∈ C[j, k]}
S ∈ C[0,n]
Noun,
Verb ——
Book Det NP ——
this Noun ——
flight Prep PP
through PNoun, NP
Houston
CKY: CHART
27
For
For // width
For // left boundary
// right boundary
For // midpoint
Return true if
i = [1 … n]
C[i − 1, i] = {V ∣ V → wi} l = 2 … n :
i = 0 … n − l : k = i + l
j = i + 1 … k − 1 :
C[i, k] = C[i, k] ∪ {V ∣ V → YZ, Y ∈ C[i, j], Z ∈ C[j, k]}
S ∈ C[0,n]
Noun,
Verb ——
Book Det NP ——
this Noun —— ——
flight Prep PP
through PNoun, NP
Houston
CKY: CHART
28
For
For // width
For // left boundary
// right boundary
For // midpoint
Return true if
i = [1 … n]
C[i − 1, i] = {V ∣ V → wi} l = 2 … n :
i = 0 … n − l : k = i + l
j = i + 1 … k − 1 :
C[i, k] = C[i, k] ∪ {V ∣ V → YZ, Y ∈ C[i, j], Z ∈ C[j, k]}
S ∈ C[0,n]
Noun,
Verb ——
Book Det NP —— NP
this Noun —— ——
flight Prep PP
through PNoun, NP
Houston
CKY: CHART
29
For
For // width
For // left boundary
// right boundary
For // midpoint
Return true if
i = [1 … n]
C[i − 1, i] = {V ∣ V → wi} l = 2 … n :
i = 0 … n − l : k = i + l
j = i + 1 … k − 1 :
C[i, k] = C[i, k] ∪ {V ∣ V → YZ, Y ∈ C[i, j], Z ∈ C[j, k]}
S ∈ C[0,n]
Noun,
Verb —— VP
Book Det NP —— NP
this Noun —— ——
flight Prep PP
through PNoun, NP
Houston
CKY: CHART
30
For
For // width
For // left boundary
// right boundary
For // midpoint
Return true if
i = [1 … n]
C[i − 1, i] = {V ∣ V → wi} l = 2 … n :
i = 0 … n − l : k = i + l
j = i + 1 … k − 1 :
C[i, k] = C[i, k] ∪ {V ∣ V → YZ, Y ∈ C[i, j], Z ∈ C[j, k]}
S ∈ C[0,n]
Noun,
Verb —— VP,
S
Book Det NP —— NP
this Noun —— ——
flight Prep PP
through PNoun, NP
Houston
CKY: CHART
31
For
For // width
For // left boundary
// right boundary
For // midpoint
Return true if
i = [1 … n]
C[i − 1, i] = {V ∣ V → wi} l = 2 … n :
i = 0 … n − l : k = i + l
j = i + 1 … k − 1 :
C[i, k] = C[i, k] ∪ {V ∣ V → YZ, Y ∈ C[i, j], Z ∈ C[j, k]}
S ∈ C[0,n]
Noun,
Verb —— VP,
S ——
Book Det NP —— NP
this Noun —— ——
flight Prep PP
through PNoun, NP
Houston
CKY: CHART
32
For
For // width
For // left boundary
// right boundary
For // midpoint
Return true if
i = [1 … n]
C[i − 1, i] = {V ∣ V → wi} l = 2 … n :
i = 0 … n − l : k = i + l
j = i + 1 … k − 1 :
C[i, k] = C[i, k] ∪ {V ∣ V → YZ, Y ∈ C[i, j], Z ∈ C[j, k]}
S ∈ C[0,n]
Noun,
Verb —— VP,
S —— S
Book Det NP —— NP
this Noun —— ——
flight Prep PP
through PNoun, NP
Houston
CKY EQUATIONS
33
CKY COMPLEXITY
Worst case:
Best is also worst case
(Others better in average case)
𝒪(n 3 ⋅ |𝒢|)
34
NEXT CLASS
Treebanks and probabilistic CFGs
35