An Effective Enumeration Algorithm of Parses for Ambiguous CFL

(1)

N ariyoshi YAM AI*, T adashi S E K O t, Noboru K U B O ^ and Toru K A W A TA ^ t D ep artm en t of Inform ation Engineering, N ara N ational College of Technology,

Y am atokoriyam a, N ara 639-11, Ja p a n

I t C o m p u ter System s L aboratories, C o rp o rate Research and D evelopm ent G roup, SH A R P C o rp o ratio n , Tenri, N ara 632, Ja p an

A n Efficient E n u m e r a t i o n A l g o r i t h m o f P a rses

for A m b i g u o u s C o n t e x t - F r e e L a n g u a g e s

A bstract

An efficient algorithm th a t enu m erates parses of am biguous context-free languages is described, and its tim e and space com plexities are discussed.

W hen co n text-free parsers are used for n a tu ra l language parsing, p a tte rn recognition, and so forth, there may be a g rea t num ber of parses for a sentence. One com m on stra te g y for efficient en um eratio n of parses is to assign an a p p ro p ria te weight to each p ro d u ctio n , and to en u m erate parses in th e order of the to ta l weight of all applied p ro d u ctio n . However, th e existing algorithm s taking this stra te g y can be applied only to the problems of lim ited areas such as regular languages; in th e o th er areas only inefficient ex haustive searches are known.

In this p a p e r, we first in tro d u ce a hierarchical g rap h su itab le for en u m eratio n. Using this g raph , enu m eration of parses in the ord er of accep tab lity is equivalent to finding p ath s of this g raph in the order of length. T h en , we present an efficient enu m eration algorithm w ith this g raph , which can be applied to arb itra ry context-free g ram m ars. For en um eration of k parses in the order of the to ta l weight of all applied productions, the tim e and space com plexities of our algorithm are 0 ( n 3 + k n 2) and 0 ( n z + fcn), respectively.

1 I n tr o d u c tio n

Context-free p arsers are com m only used for n a tu ra l language parsing, p a tte rn recognition, and so fo rth. In these ap p lic atio n s, th ere m ay be a g rea t num ber o f parses (or deriv ation s) for a sentence, only a few of which would be needed in la te r processes. T herefore, we look up only a few prom ising parses and do not make an inefficient ex h au stiv e search of parses. In order to find a few prom ising parses efficiently, we often take a stra te g y th a t an a p p ro p ria te weight is assigned to each p ro d u ctio n and parses are looked up in the order of th e to ta l weight of all applied p ro d u ctio n s. If the assigned weight is selected carefully to have strong co rrelatio n to w h eth er a p arse is accepted or not, looking up parses in the ord er of the to ta l weight is equivalent to e n u m e ratio n of parses in the o rd er o f acceptability. For exam ple, in the p u n c tu a tio n problem of Japan ese sentences, the nu m b er o f th e phrases of the sentence is known to be an excellent c a n d id a te for the weight of parses. However, th e alg o rith m s proposed so far th a t took this s tra te g y are applied only to the problem s of the lim ited areas such as regular languages, and they are not applied to general con text-free languages.

In this p ap er, we present an efficient en u m eratio n algorithm based on this strateg y , which can be applied to general co n tex t-free g ram m ars. We in tro d u ce a d a ta s tru c tu re su ita b le for en u m eratio n of parses nam ed

(2)

a parse graph, and present how to c o n stru c t a parse graph in section 3. W ith a parse grap h , a p a th between two special vertex, some of whose arcs are replaced iteratively by the p ath denoted by their labels, represents a right parse of the parsed sentence. Because the length of path s represents the to ta l weight of all applied p ro d u ctio n s for parses, enu m eratio n of parses in the order of the to ta l weight of all applied p rod u ctio n s is equivalent to finding p ath s on the parse g raph in the order of length. In section -4. we show the outline of how to en u m e rate the parses of th e am biguous sentence in the order of their weight, using the parse graph. We also discuss the tim e and space com plexities of the algorithm in th a t section.

2 C o n tex t-F re e Parsing A lg o rith m

Several general context-free parsing algorithm s have been proposed so far, nam ely C ock e-Y ou ng er-K asam i algorithm [2, 3], E arley ’s algorithm [4], V alian t’s algorithm [5j, G ra h a m -H a rriso n -R u z z o algorithm [6, 8], and so forth. T h e features of these algorithm s are the following. C o ck e-Y o u n g er-K asam i algorithm (CYK alg o rith m for sh o rt) is a kind of the bo tto m up parsing algorithm s, and has 0 ( n 3 ) tim e com plexity, w h e r e n

is the length of the sentence. In this algorithm , the g ram m ar is required to be w ritten in Chom sky normal form. E a rle y ’s algorithm is a kind of th e top down parsing algo rith m s, and has 0 ( n 3) tim e com plexity. By c o n tra s t w ith CYK algorithm , no special pro d u ctio n form is required in E a rle y ’s algo rith m . V alian t’s alg o rith m and G ra h a m -H a rriso n -R u z z o algorithm (G H R algorithm for sh o rt) are the m odified versions of CYK algo rithm and E a rle y ’s algo rith m , respectively. B oth of them use th e technique of m atrix m ultiplication in order to reduce the tim e com plexity, T he tim e com plexity of V a lia n t’s algo rith m is 0 ( n2 81) and th a t of

G H R a lg o rith m is 0 ( n 3/ l o g n). However, in b o th algorith m s, the overhead for m atrix m u ltip licatio n is so large th a t these algo rithm s d o n ’t seem su itab le for the p ractical use.

In this p ap er, we ad o p t E a rle y ’s algorithm as the base of our algorithm because of the following two reasons:

(1) No special p ro d u ctio n form is required.

(2) E a rle y ’s alg o rith m seems m ore su ita b le th a n V a lia n t’s algorithm and G H R alg orith m because the overhead of these two alg o rithm s is qu ite large.

Let G = (V^v, Vt, P , S ) be a g ram m ar, w here V y is the set of no n term in al sym bols, V j is th e set of te rm in a l sym bols, P is th e set of p ro d u ctio n s, and 5 € V y is the s ta r t sym bol. In E a rle y ’s alg o rith m , the item lists /q, A , . . . , I n+i are c re a te d , w here n is the length of th e parsed sentence. Each item list consists o f several i t ems [A — a ■ /3 (p), / ] , w here A — a ft G P, p is the index num ber of the p ro d u ctio n , is the m e ta sym bol th a t shows how much of th e right side of th e pro d u ctio n has been recognized so far, and / is an integer which den otes the p osition in th e in p u t strin g at which we began to look for th a t in sta n c e of the p ro d u ctio n . T h e set of item lists {/o, A , . . . , / n , / n+ i} is called the parse list.

As for th e tim e and space com plexities for E a rle y ’s alg o rith m , the following are know n[l].

( e - 1 ) T h e tim e and space com plexities for parsing a sentence by E a rle y ’s alg orith m are 0 ( n 3) an d 0 ( n 2), respectively, w here n is th e len g th of the parsed sentence.

( e - 2 ) T h e tim e com plexity for deriving a parse from the parse list is 0 ( n 2), w here n is the len g th of the parsed sentence.

(3)

3 Parse G raphs

3.1 T h e f e a t u r e s o f p a r s e g r a p h s

The parse g raph is a d irected graph which consists of several connected com po nen ts. Each connected com ponent is called a layer of the parse graph. Each layer is an acyclic g raph th a t has only one source, and it co rresponds to eith er a non term in al sym bol or an integer. An layer co rresponding to a n o nterm in al sym bol has only one sink. W ith this grap h , we can e x tra c t parses more efficiently th an with a parse list of E arley ’s alg orith m . As shown in the next section, a path between two special vertex, some of whose arcs are replaced iteratively by the p ath d enoted by their labels, represents a right parse of the parsed sentence.

In the rem aind er of this paper, we use the following n o tation s. L ( f ) T h e layer corresponding to an integer / . L{ A) T h e layer corresponding to a non term inal sym bol .4. L( v) T he layer containing a • tex v.

L(e) T h e layer containing an ^.rc e.

Uj(A') T he source of the layer L ( X ) , where X is eith er an integer, a n on term inal sym bol, a vertex, or an arc.

vt( A ) T h e sink of the layer L{A) , where .4 is a n o n term in al sym bol. Note th a t the layer co rresponding to a nonterm in al sym bol has only one sink.

In the parse g rap h , each arc has one of the following labels.

(1 ) An index num ber of the p ro d u ctio n p, which denotes the derivatio n by .4 — a (p). (2 ) A n o n term in al sym bol A, which denotes the derivation A ^ e.

(3 ) T h e index of a vertex v, which denotes the p a th from u ,(y ) to v.

W hen we describe the arc e = (m, v) w ith its label of each kind, we use the n o tatio n s e(p), e( A) , e[u], or the a lte rn a tiv e n o tatio n s (u ,t> ;(p )), ( u , v \ ( A ) ) , (« ,u ;[v j), respectively.

In stead of an item of the form [A — ot-fi (j>), / ] in E a rle y ’s algorith m , we use the trip le t [.4 — or-/? (p), / , y] as an item o f our algorithm for c o n stru c tin g a parse grap h , where v is th e index of a vertex.

For exam ple, we parse the sentence xx of the gram m ar shown in Figure 1. T h e parse list and the parse graph g e n e ra te d from this sentence are shown in Figure 2 and Figure 3, respectively.

In Figure 3, th e label “(2 )” of th e arc from v ertex # 8 to vertex # 9 ind icates th e d erivation by 5 — 5 / (2), the label “ ( 5 ) ” of th e arc from vertex # 0 to vertex # 1 in d icates the derivation 5 ^ 6 , and th e label “[7]” of the arc from vertex # 2 to vertex # 8 in d icates the p ath s from vertex # 0 to vertex # 7 .

(4)

O ur algo rith m for c o n stru c tin g a parse g rap h is based on E arley ’s algorithm . In E a rle y ’s algorithm , one of three o p eratio n s is perform ed on each item , d epending on its form, to add m ore item s to the item lists. In our alg orith m , these o p eratio ns not only add m ore item s to item lists b u t also add new vertices and arcs to the parse g rap h, shown as follows.

3 .2 A n a lg o r it h m for c o n s t r u c t in g a p a r s e g r a p h

S - € ₍₁₎ S — S J (2) J - F (3) J -* / (4) F —* x ₍₅₎

I —►X ₍₆₎

Figure 1: An am biguous context-free g ram m ar

(5)

Io

h

S' — •5$ (0), 0, 0

s — _(1), 0, 0

s — ■S J (0), 0, 0

' S' — S - $ (0), 0, 1

s — 5 • .7 (2), o, 2

J — ■F (3), 0, 0

J — ■I (4), 0, 0

F — ■i (5), 0, 0

I —- ■i (6), 0, 0

F — _{x •} _(5), _0, ₀

I — x- (6), 0, Q

J — F- (3), 0, 4

J — I- (4), 0, 6

s — S J - (2), 0, 8

S' — S - $ (0), o, 10

s — 5 • J (2), 0, 11

J — ■F (3), 1, 12

J — ■I _(4), 1, 12

F — ■x (5), 1, 12

I —* ■I (6), 1, 12

F — _X" _(5), _1, ₁₂

I — I- (6), 1, 12

J — F- (3), 1, 14

J — I- (4), 1, 16

S — S J - (2), 0, 18

S ' — s ■ $ (0), 0, 20

s — 5 • J (2), 0, 21

J — ■F (3), 2, 22

J — ■I (4), 2, 22

F — ■x (5), 2, 22

I — ■X _(6), _2, 22

S' — _{5$ .} _(0), _o, ₂₀

Layer 0:

.i? * * 0 j z u 0 j.2lK 3

2 8 g

|- (- - K D

\ ( 3 )

h'-^KD >

5 ,/ * 7

[51 / < 4 >

6 [9]

K D 10

**^Uoiizi*o<2**

i>o

11 18 19

[19]

20

[19] * o

21

Layer 1:

r — K D

13

\ ( 3 ) ^ 15 >/▼ 17

[151 / < 4 >

12 18

Layer 2:

o

22

Layer S:

y-V»<S) vi(S)

(p) : a label "production p"

[v] : a label “vertex v“

<S> : a label "nonterm inal S'

Figure 3: A parse graph for the sentence xx of the g ram m ar in Figure 1

h :

re 2: A parse list for th e sentence xx of the g ram m ar in Figure 1

[image:5.560.5.543.18.737.2]

(6)

T he s canner is perform ed when an item in Ij is of the form [.4 — a ■ aJ + l 3 ( p ) , f , v ] . It puts the item [*4 — a a j+1 • 3 {p), / , v] to Ij + l .

O p e r a t i o n 2 ( p r e d i c t o r )

T he predictor is perform ed when an item in Ij is of the form [.4 — a • B 3 [ p ) , f , v]. It adds item s [B — •7 k (Pk) * j i i>iO)] for B -productions B — ~{k (pk) to Ij,except in the case where these item s have already been added to Ij. If the vertex v , ( j ) have not been created yet, the predictor creates v , { j ) to the layer L ( j ) . Especially, in the case where B => C LC2 • • • C m e and C LC V - - C m G V y , the p red ictor adds the vertices vs{B) , v u u2, . . . , ym_ ! , vm , vt{B) , and the arcs ( t>,( B) , v{\(C L)), ( , i'2;( C 2)), ( e?2, vy, (C 3)), ( t'm —21 ym — I i ( C m— L ) ) t ( ^m —1 ? L'm ; (C m)), ( um, (p)) to 1 ( 5 ) if they are not in L{ B) , and perform s one of the following:

( a ) If an item of the form [.4 — q B • 3 (p ) , f , w ] is already in Ij, then add the arc (v , w \ ( B )) to the parse g rap h .

( b ) O therw ise, add the vertex w and the arc ( v , w ' , ( B ) ) to the parse grap h , and add the item [.4 — q B • 0 [ p ) , f , w ] to Ij.

O p e r a t i o n 3 ( c o m p l e t e r )

T h e com pleter is perform ed when an item in Ij is of the form [.4 — a • ( p ) , f , v ] . It perform s one of the following:

( a ) If / = j , th en the item would be processed by the p red ictor. T herefore, the co m pleter does nothing. ( b ) If / ^ and th ere exists an item of the form [A — ,3• (9-),/, u] ( p ^ q, u ^ v ) in /j, and th e arc

(u, w; (<7)) in the parse g rap h , then add the arc (v, w\ (/>)) to the parse grap h.

(c ) O therw ise, add a new vertex 1 and a new arc { v , x ; ( p ) ) to the parse g rap h . F u rth erm o re, for all items

of the form [Bk — Ik ■Abk ipk), f k , ^ k ] in / / , perform one of the following:

( c -1) If th ere exists an item o f the form [Bk — 7*.4 • 6k (p k ) , f k , V k] in Ij w here u k ^ v*, th en add a

new arc [ uk , i>*;[1]) to the parse graph.

( c -2) O therw ise, add a new vertex vk and a new arc (u*, t;*; [z]) to the parse g rap h , and add a new item [Bk — 7k A ■6k (Pk), A , Vfc] to Ij.

We describe our algorith m for c o n stru c tin g a parse graph as follows:

A lg o r ith m 1. A n a lg o r ith m for c o n s tr u c tin g a p a rse grap h

A co n tex t-free g ram m ar G = { Vy , V j, P, S ) and a sentence a La2 • • • a n are given.

[step 1] A dd th e m e ta sym bol “ $n to th e tail of the sentence. Add th e p ro d u ctio n S ' — 5$ (0) to P. C re a te the parse grap h co n sisting of r , (0). C re a te the item list Iq consisting of [5 ' — -5$ (0), v,(0)].

[step 2] C re a te th e item lists Iq , /

1

,. . . , /„+ i in order, by perform ing the following o p e ra tio n s from k = 1

to k = n.

O p e r a t i o n 1 ( s c a n n e r )

(7)

(1 ) Perform the predictor or the com pleter to add item s to the item list Ik, until no more item s can be added to I

( 2 ) T h en, perform the scanner to add item s to

Ik+i-[ s te p 3] If / n+i has an item of the form Ik+i-[5 / —* 5$ • (0), u], then it m eans th a t the parser accepts the sentence, and the algorithm term in ates. O therw ise, it m eans th a t the p arser rejects the sentence, and th e algorithm term in ates.

Note th a t this algorithm is the sam e as E arley's algorithm except the portion for co n stru ctin g a parse graph.

As for the tim e and space com plexities of this algorithm , the following theorem holds.

T h e o r e m 1. The tim e and space com plexities of our algorithm are b oth 0 ( n 3), where n is the length of the sentence.

(p ro o f) C onsider the num ber of item s in the item lists. Acc«. .ding to th ree o p eration s, nam ely the scanner, th e p redictor and the com pleter, each item list does not have item s such th a t their first and second com ponents are the sam e. T herefore, each item list has 0 ( n ) item s, because th e num ber of th e kinds of the first com po nen t is c o n sta n t, and th a t of the second com ponent is not more th an n + 2. Hence, th e num ber of item s of the parse list is 0 ( n 2), because the parse list consists of n + 2 item lists. Consider the tim e and space com plexities of th e o p eratio n s per item .

(1) As for the scan ner, the tim e and space com plexities are both 0 ( 1 ) .

(2) As for the p red icto r, at m ost 0 ( \ P \ ) item s are added to the item list, and 0 ( | P | ) vertices and arcs are a dd ed to the parse g rap h , where |P | denotes the nu m ber-of the p ro d uctio ns. T herefore, the tim e and space com plexities are both 0 ( | P | ) = 0 ( 1 ).

(3) As for th e co m p leter, if the second com ponent of the perform ed item is / , th e com pleter scans all item s in I f , adds at m ost 0 ( n ) item s to the item list, and adds at m ost 0 ( n ) vertices and arcs to the parse g rap h . T herefore, the tim e and space com plexities are b o th 0 ( n ) .

C onsequently, the tim e and space com plexities of the o p eration s per item is 0 ( n ) . T herefore, the tim e and space com plexities of the parse g rap h c o n stru ctio n algorithm are both 0 ( n 3). □

C o m p ared w ith (e -1 ) in section 2, th e tim e com plexity for c o n stru c tin g a parse g rap h is th e sam e as E a rle y ’s alg o rith m , but the space com plexity is worse because the num ber of arcs in a parse g rap h is 0 ( n 3).

4 E n u m e r a tio n o f P arses

4.1 E xtracting parses

In ord er to e x tra c t parses from a parse grap h , we in tro d u ce a traversal paths of a parse g rap h . T h e n o ta tio n 7r( u, v ) rep resen ts trav ersal p a th s from u to v.

A traversal path from a vertex it to a vertex v is defined as follows provided th a t L( u) = L(v). (1 ) A null sequence is defined as a trav ersal p a th if u = v .

(8)

( 2 ) T h e sequence of the arcs where e x = ( i t , - , i \ ) , is defined as a traversal pa,th if a = u i , v i =

U o, U2 = « 3 . • • • . y n - l = « n , = y

-( 3 ) Let e i e2 • • • c n be a trav ersal p a th from u to v. in which an arc e; is labeled w ith a no nterm in al sym bol

.4. T he-sequence of the arc e'i • • • e,_ i tt( i’3( .4). i’t(.4))e,-+i • • • en in which e,(.4) is replaced by a trav ersal p a th JrfwjM ), is defined as a traversal path.

(4 ) Let eieo • • • en be a traversal p a th from u to u, in which an arc e, is labeled with the index of a vertex v. T h e sequence of the arc e L • • • e ,_ Lx (vs[v), v)e, + l • • • en in which e,[u] is replaced by a traversal path i c[v, ( v) , v) is defined as a trav ersal p ath .

Especially, th e traversal p a th th a t has only the arcs labeled with the index of the p ro d u ctio n is called a proper traversal path. T he n o tatio n r* (u , v) represents proper trav ersal p a th s from u to v. T his n o tatio n is also used to represent the sequence of the labels of the proper trav ersal p ath s.

As for th e relatio nship betw een proper trav ersal p a th s and parses, the following theorem holds.

T h e o r e m 2. If there exist two item s [A — or • 7 (p), / , u] E / j , [-4 — a/3 • 7 (p), / , v] € Ik, where a ,/? , 7 G V m, the sequence of the labels r * ( u ,v ) is the reverse order of the sequence of the p ro du ction num b ers used for the rightm ost derivation aJ + 1 • • • .

r m

(p ro o f) It is easy to prove this theorem by induction on the length of the derivation sequence. □

Let u((0) be th e th ird com p o n en t v of the item [5' —• 5 S ■ (0), 0, v] € / n+1- A ccording to th eorem 2, th e sequences of the labels 7rm( u,(0), vt(0)) represent the right parses of the parsed sentence. An exam ple of

a p ro p er tra v e rsa l p a th of the parse graph in Figure 3 is shown in Figure 4, w here Uj(0) is vertex # 0 and yt(0) is v ertex # 2 0 .

A right parse can be e x tra c te d from the parse grap h by searching a p ro p er trav ersal p a th from i/f(0) tow ard tfj(O). T his e x tra ctio n can be done w ith o u t backtracking, because each layer has only one source. T h erefo re, th e following theorem holds.

T h e o r e m 3. If the given g ra m m a r is cycle-free, the tim e com plexity for e x tra c tin g a parse is O (n), w here n is the len gth of th e sentence.

(p ro o f) If the g ram m ar is cycle-free, th e length of the parse is O( n) . T h erefo re, the tim e com plexity is

O( n) . □

C o m p ared w ith (e -2 ) in section 2, th e tim e com plexity for e x tra c tin g a parse is b e tte r th a n E arley ’s alg o rith m .

4.2 An algorithm for parse enum eration

Using a parse g rap h , en u m e ratio n of the parses in the order of the to ta l weight is equivalent to enum eration of th e p ro p er tra v e rsa l p a th s from ^j(O) to vt(0) in the ord er of th e len gth . W hile m any researchers have developed th e alg o rith m s for finding th e k sh o rte st paths[9, 10, 11, 12, 13], we ap ply one of them developed by K a to h , Ib ara k i and Mine[10] to th e parse grap h recursively. Because of the lack of the space, we explain

(9)

o --- ^ --- O

\0

!

₂₀

I

O --- s --- K D --- ^ --- >

_

\ ° / v 1 ,' 18 W 19

I \ I \

fy

-<-s->- >o--- 0-- ( > [131 >0 ^ -6

\0

! \ 2

; 8

9 ',12

/

14 17

i \ i \

o!!k) a J5 U o !^ d

vs(S) vt(S) \ 0 / 4 7 12 13

right parse: 1 5 3 2 5 3 2

Figure 4: A proper trav ersal p a th from vertex # 0 to vertex # 2 0

only the o u tlin e of the algo rith m . T h e details of the algorithm are described in [14]. In the following discussion, the k -th sh o rte st trav ersal p a th from i/,(0) to vt(0) is referred to as x*.

First of all, derive the shortest path tree for v ,(0 ), den oted as T ( v s(0)), which consists of the arcs of the sh o rte st p a th s from i!,(0 ) to all o th er vertices. T h e sh o rtest p a th tree can easily be derived in the alg orithm for c o n stru c tin g a parse g rap h , x 1 can be e x tra cte d from T (u J(0)). tt2 consists of th e p a th o f T ( v ,( 0 ) ) from M O ) to a vertex u, the arc (u, v) where v is one of the vertices on 7T1, and the su b p a th of from v to v*(0). T herefore, th e num ber of th e c a n d id ates of tt2 is the sam e as the sum o f the in-degree of all vertices on th e

sh o rte st p a th . As for the parse g rap h , the length of the sh o rte st p ath and the in-degree of a vertex are b o th 0 ( n ) [ l4 ] , and hence we can derive tt2 in 0 ( n 2). In order to derive t 3, all p a th s from vf (0) to ut(0) except

tt1and x 2 are divided into th re e sets as follows (see Figure 5):

(1 ) T h e set of p a th s th a t jo in the s u b p a th com m on to tt1 and tt2. T he sh o rte st p ath in this set is referred

to as i a .

(2 ) T h e set of p a th s th a t jo in x 1, and con tain the s u b p a th com m on to x l and x 2 as their final s u b p a th . T h e sh o rte s t p a th in th is set is referred to as x*,.

(4 ) T h e set of p a th s th a t jo in x 2, and contain the su b p a th com m on to x 1 and x 2 as their final s u b p a th . T h e sh o rte st p a th in this set is referred to as xc.

xa , Xi, and x c can be derived in th e sam e m an ner as deriving x 2 in 0 ( n 2), respectively, x 3 is the sh o rte st one of xa , Xfc, and xc, and the rest of these p ath s are stored in a n o th e r set as th e c a n d id a te s o f x 4. x 4, x 5, . . . are derived by rep e a tin g the sim ilar calcu latio n . T herefore, the tim e and space com plexities of the e n u m e ratio n of the k sh o rte st p a th s are 0 ( n 3 + k n2) and 0 ( n 2 + k n), respectively.

In th e above discussion, the k sh o rte st p a th s are derived. However, we can also derive the k longest p ath s in the sam e m an ner.

[image:9.560.35.556.53.351.2]

(10)

Figure 5: T h e relation am ong tl , x2, and t3

T able 1: T h e tim e and space com plexities of our algorithm s (n :th e length of the sentence)

C om plexity C o n stru ctio n of parse graph E num eration of k parses

T im e 0 ( n 3) 0 ( n 3 + k n2)

Space _{0 ( " 3)} 0 ( n2 4- k n)

We su m m arize th e tim e and space com plexities of our algorithm s in Table 1.

5 C o n clu sio n

In this p a p e r, we have p resented an alg o rith m for th e en u m eratio n of the parses in th e ord er of th e accept ability. T h is alg orithm can be applied to th e general context-free languages. In order to e n u m e rate parses efficiently, we have in tro d u c ed a d a ta s tru c tu re su itab le for the en u m eratio n called th e parse grap h . Using a parse g rap h , we can e n u m e rate k parses in th e order of accep tab ility efficiently in 0 ( n z + k n 2).

A c k n o w le d g e m e n t

We a p p re c ia te Prof. Isao Shirakaw a and Prof. Hideo M iyahara of O saka U niversity for th eir helpful s u p p o r t .

T h e first a u th o r th a n k s Prof. T oshiro A raki, Dr. Hiroshi Deguchi, Dr. Shinji S him ojo of O saka University, a n d Mr. T oshiyuki M asui of S H A R P C o rp o ratio n for their helpful suggestions an d co m m en ts on this paper.

[image:10.552.35.535.62.528.2]

(11)

1] Aho, A. V. and Ullman, J. D., The Theory of Parsing, Translation, and Compiling, Vol. 1:Parsing,

Prentice-Hall, 1972.

Kasami. T., “An efficient recognition and syntax analysis algorithm: for context-free languages” , Science Report, AF CRL-65-758, Air Force Cam bridge Research Laboratory, 1965.

Younger, D. H., “ Recognition and parsing of context-free languages in time n 3” , I nformation and Control, 10, p p . 189-208, 1967.

Earley, J., “ An efficient context-free parsing a lgorithm ” , Communi cati on of A. C. M. , 13-2, p p .94-102, 1970.

Valiant, L. G., “General context-fiee recognition in less th an cubic tim e ” , J.C.S.S., 10, p p .308-315, 1975.

G ra h a m , S. L., Harrison, M. A., and Ruzzo, W. L., “On line context-free recognition in less than cubic tim e ” , Proc. 8th Annu. A. C . M . Symp. on Theory of Computing, p p . 112-120, 1976.

G ra h a m , S. L., and Harrison, M. A., “ Parsing of general context-free languages” , Advances in C o m p u t  ers, 14, Academic Press, pp.415-462, 1976.

G r a h a m , S. L., and Harrison, M. A., “An improved context-free recognizer” , A. C . M . Trans, on Pro gra mmin g Languages and Systems, 2-3, p p .415-462, 1980.

Yen, J. Y., “ Finding the K shortest loopless paths in a netw ork” , M anag eme nt Science, 17, p p .712-716, 1971.

K atoh, N., Ibaraki, T ., and Mine, H., “An efficient algorithm for K sh ortest simple p a th s ” , Networks, 12, p p .411-427, 1982.

Fox, B. L., ‘‘D a ta s tru c tu re s and c o m p ute r science techniques in operations research” , Operations Research, 26, pp.686-717, 1978.

[12] Denardo, E. V., and Fox, B. L., “Sho rtest-ro ute methods: 1. reaching, pruning and buckets” , Operations Research, 27, p p . 161-186, 1979.

[13] Lawer, E. L., “ A p rocedure for c o m puting the K best solutions to discrete op tim izatio n problem s and its application to the s h o rte st pa th p roblem ” , Management Science, 18, p p .401-405, 1972.

[14] Yamai, N., “A s tu d y for parsing of ambiguous languages using hierarchical grap h rep resen tation of all deriv atio ns” , Mast er Thesis of Osaka University, 1986 (in Japanese).

An Effective Enumeration Algorithm of Parses for Ambiguous CFL

A n Efficient E n u m e r a t i o n A l g o r i t h m o f P a rses

for A m b i g u o u s C o n t e x t - F r e e L a n g u a g e s

A bstract

1

I n tr o d u c tio n

2

C o n tex t-F re e Parsing A lg o rith m

3

Parse G raphs

^Uoiizi*o<2

i>o

o

1

4 E n u m e r a tio n o f P arses

4.1

E xtracting parses

4.2

An algorithm for parse enum eration