A S Y N T A X P A R S E R B A S E D ON THE C A S E D E P E N D E N C Y G R A M M A R A N D ITS E F F I C I E N C Y
T o r u H i t a k a and Sho Y o s h i d a
D e p a r t m e n t of E l e c t r o n i c s , K y u s h u U n i v e r s i t y , Fukuoka, J a p a n
S U M M A R Y
A u g u m e n t e d t r a n s i t i o n n e t w o r k
g r a m m a r s (ATNGs) or a u g u m e n t e d c o n t e x t - free g r a m m a r s are g e n e r a l l y u s e d in n a t u r a l l a n g u a g e p r o c e s s i n g systems. The a d v a n t a g e s of A T N G s m a y be summa- r i z e d as i) e f f i c i e n c y of r e p r e s e n t a - tion, 2) p e r s p i c u i t y , 3) g e n e r a t i v e power, and the d i s a d v a n t a g e of A T N G s is t h a t it is d i f f i c u l t to get an effi- c i e n t p a r s i n g a l g o r i t h m b e c u a s e of the f l e x i b i l i t y of their c o m p l i c a t e d
a d d i t i o n a l functions.
In this paper, the s y n t a x of J a p a n e s e s e n t e n c e s , b a s e d on case d e p e n d e n c y r e l a t i o n s are s t a t e d first, a n d t h e n we give an b o t t o m - u p and b r e a d t h - f i r s t p a r s i n g a l g o r i t b x n w h i c h p a r s e s input s e n t e n c e u s i n g time O(n 3) and m e m o r y space O(n2), w h e r e n is the l e n g t h of input sentence. M o r e o v e r , it is shown t h a t this p a r s e r r e q u i r e s time O(n2), w h e n e v e r e a c h B - p h r a s e in input s e n t e n c e is u n a m b i g u o u s in its g r a m m a t i c a l structure. T h e r e f o r e , the e f f i c i e n c y of this p a r s e r is n e a r l y equal to the E a r l e y ' s p a r s e r w h i c h is the m o s t e f f i c i e n t p a r s i n g m e t h o d for g e n e r a l c o n t e x t - f r e e g r a m m a r s .
1. F U N D A M E N T A L S OF J A P A N E S E S E N T E N C E T h e J a p a n e s e s e n t e n c e is o r d i n a r i l y w r i t t e n in k a n a (phonetic) l e t t e r s a n d k a n j i (ideographic) c h a r a c t e r s w i t h o u t l e a v i n g a space b e t w e e n words. F r o m the v i e w p o i n t of m a c h i n e p r o c e s s i n g , however, it is n e c e s s a r y to e x p r e s s c l e a r l y the units c o m p o s i n g the s e n t e n c e in such a w a y as to leave a space b e t w e e n e v e r y w o r d as in English. We have no s t a n d a r d w a y of s p a c i n g the units t h o u g h the n e e d for this has b e e n d e m a n d e d for a long time.
We give some e x a m p l e s in F i g u r e i. T h e first s e n t e n c e in the figure is of o r d i n a r y w r i t t e n form.
The s e c o n d i n d i c a t e s a w a y of s p a c i n g (i.e. p u t t i n g a space b e t w e e n e v e r y word).
The t h i r d i n d i c a t e s a n o t h e r w a y of s p a c i n g (i.e. p u t t i n g a space b e t w e e n e v e r y B - p h r a s e ) .
N o w a d a y s , m a n y o t h e r s p a c i n g m e t h o d s have b e e n t r i e d in s e v e r a l i n s t i t u t e s in Japan.
In this paper, input s e n t e n c e s are g i v e n in c o l l o q u i a l style in w h i c h a s p a c i n g symbol is p l a c e d b e t w e e n two s u c c e s s i v e B - p h r a s e s .
In J a p a n e s e sentences,
BUNSETSUs(B-
phrase) are the m i n i m a l m o r p h o l o g i c a l units of case d e p e n d e n c y , and the s y n t a x of J a p a n e s e s e n t e n c e s c o n s i s t s of (i) the s y n t a x of B - p h r a s e as a s t r i n g of words, and (2) the s y n t a x of a s e n t e n c e as a s t r i n g of B - p h r a s e s .A B - p h r a s e u s u a l l y p r o n o u n c e d w i t h o u t p a u s i n g c o n s i s t s of t w o parts - - m a i n p a r t [or e q u a l l y an i n d e p e n d e n t p a r t in the c o n v e n t i o n a l school g r a m m a - tical term] and an a n n e x p a r t w h i c h is p o s t p o s i t i o n e d . We d e n o t e the c o n n e c - tion of two p a r t s in a B - p h r a s e by a dot if n e c e s s a r y . A m a i n part, w h i c h is a c o n c e p t u a l w o r d [or e q u a l l y an inde- p e n d e n t word] (e.g. noun, verb,
a d j e c t i v e or adverb) p r o v i d e s m a i n l y the i n f o r m a t i o n of the concept. On the o t h e r hand, an a n n e x part, a p o s s i b l y null string of s u f f i x w o r d s (e.g. auxi- liary verbs or particles) p r o v i d e s the i n f o r m a t i o n c o n c e r n i n g the k a k a r i u k e r e l a t i o n a n d / o r the s u p p l e m e n t a r y
i n f o r m a t i o n (e.g. the s p e a k e r ' s a t t i t u d e t o w a r d s the c o n t e n t s of the sentence, tense, etc.)
A w o r d w has it's s p e l l i n g W, p a r t of s p e e c h H and i n f l e x i o n K. We call
(W,H,K) the w o r d s t r u c t u r e of w.
S u p p o s e t h a t a string b of l e n g t h n be a B - p h r a s e . Then, there e x i s t an i n d e p e n d e n t w o r d w 0 a n d s u f f i x w o r d s Wl, w z, ... , w~, and
b = w 0 w I . • • w~
B - p h r a s e s and Termi(HQ,Kz) m e a n s a w o r d w h o s e part of speech ~nd i n f l e x i o n are H£, KZ r e s p e c t i v e l y can be a r i g h t - m o s t s u b w o r d of B-phrases.
(i), (2) are c a l l e d the rules of B- phrase structure, and
(W0,H0,K 0) (Wi,HI,K ~) "''(Wz,H~,K ~) • . . ( 3 ) is c a l l e d B - p h r a s e s t r u c t u r e of b. If
(3) satisfies the c o n d i t i o n (i), w 0 w l w
• . . w Z is c a l l e d to be a left p a r t i a l 2
B-phrase.
The kakariuke r e l a t i o n is the depen- dency r e l a t i o n b e t w e e n two B - p h r a s e s in a sentence. A B - p h r a s e has the syn- tactic functions of g o v e r n o r and
dependent. The f u n c t i o n of g o v e r n o r is m a i n l y r e p r e s e n t e d by the i n d e p e n d e n t w o r d of B-phrase. The f u n c t i o n of d e p e n d e n t is m a i n l y r e p r e s e n t e d by the string of p a r t i c l e s w h i c h is the right- m o s t s u b s t r i n g of B - p h r a s e and by the w o r d in front of it (right-most non- p a r t i c l e word).
E v e r y p a r t i c l e has the s y n t a c t i c and p a r t i a l l y s e m a n t i c d e p e n d e n t function w i t h its own degree of power. The p a r t i c l e w h o s e p o w e r of d e p e n d e n t f u n c t i o n is s t r o n g e s t of all p a r t i c l e s a p p e a r i n g in the string of p a r t i c l e s is c a l l e d the r e p r e s e n t a t i v e particle. Therefore, the syntactic f u n c t i o n of d e p e n d e n t of a B - p h r a s e is m a i n l y r e p r e s e n t e d by the r e p r e s e n t a t i v e p a r t i c l e and by the r i g h t - m o s t non- p a r t i c l e word.
Let (W0,H0,K0) , (Wi,Hi,Ki) , (W~,H~,K~) be the w o r d s t r u c t u r e s of i n d e p e n d e n t J word, r i g h t - m o s t n o n - p a r t i c l e w o r d and r e p r e s e n t a t i v e p a r t i c l e of a B-phrase, respectively. Then, <W^,H^>_, <W..,Hi, u u ~ & Hj> d are c a l l e d the i n r o r m a t l o n or g o v e r n o r and the i n f o r m a t i o n of depen- dent of the B - p h r a s e r e s p e c t i v e l y , and the pair (<W0,H0>~,<Wi,Hi,Hj>d) is c a l l e d d e p e n d e n c y ~ i n f o r m a t i 6 n of the B-phrase.
There are m a n y types of d e p e n d e n c y r e l a t i o n such as agent, patient, instrument, location, time, etc. Let C be the set of all types of d e p e n d e n c y relation. The set of all p o s s i b l e d e p e n d e n c y r e l a t i o n s from a B - p h r a s e b l to a B - p h r a s e b 2 is founded on the i n f o r m a t i o n of d e p e n d e n t of b I and the i n f o r m a t i o n of g o v e r n o r of b 2. There- fore, there is a f u n c t i o n 6 w h i c h com- putes the set of all p o s s i b l e d e p e n d e n - cy r e l a t i o n s ~(a,8) b e t w e e n a B - p h r a s e of d e p e n d e n c y i n f o r m a t i o n ~ and a n o t h e r B - p h r a s e of d e p e n d e n c y i n f o r m a t i o n 8. The f u n c t i o n ~ is r e a l i z e d by the d e p e n d e n c y d i c t i o n a r y r e t r i e v e d w i t h the key of two d e p e n d e n c y informations.
The o r d e r of B - p h r a s e is r e l a t i v e l y free in a simple sentence, except for one c o n s t r a i n t that the p r e d i c a t i v e B - p h r a s e g o v e r n i n g the w h o l e sentence must be in the s e n t e n c e ' s final posi- tion. J a p a n e s e is a post p o s i t i o n a l in this sense.
The p a t t e r n of the d e p e n d e n c y r e l a t i o n s in a s e n t e n c e has some
s t r u c t u r a l p r o p e r t y w h i c h is c a l l e d the rules of d e p e n d e n c y structure, and the d e p e n d e n c y r e l a t i o n s in a s e n t e n c e are c a l l e d the d e p e n d e n c y s t r u c t u r e of a sentence. The d e p e n d e n c y s t r u c t u r e of a s e n t e n c e is shown in figure 2, w h e r e arrows i n d i c a t e d e p e n d e n c y rela- tions of v a r i o u s types. The rules of d e p e n d e n c y structure c o n s i s t of follow- ing three conditions.
i Each B - p h r a s e e x c e p t one at the s e n t e n c e final is a d e p e n d e n t of e x a c t l y one B - p h r a s e a p p e a r i n g after it.
ii A d e p e n d e n c y r e l a t i o n b e t w e e n any two B - p h r a s e s does not cross w i t h a n o t h e r d e p e n d e n c y r e l a t i o n s in a sentence.
iii No two d e p e n d e n c y r e l a t i o n s d e p e n d i n g on the same g o v e r n o r are the same.
Let N be the number of B - p h r a s e s in a input sentence, and all B - p h r a s e s are n u m b e r e d d e s c e n d i n g l y from right to left (see figure 2). We shall fix an input sentence, t h r o u g h o u t this chapter. Let DI(i) be the set of all d e p e n d e n c y i n f o r m a t i o n s of i-th B-phrase.
Definition: A d e p e n d e n c y file DF of a s e n t e n c e is a finite set of 5-tuples.
(i,j,ai,ej,c) 6 DF
.... _ ~ { N=i>j=l, a i E DI (i), cde . ~j 6 DI(j) and c E ~ ( a i , a j ) . Definition: If a subset of DF
s a t i s f i e s f o l l o w i n g c o n d i t i o n s i) to 5), it is c a l l e d a d e p e n d e n c y s t r u c t u r e from the Z-th B - p h r a s e to the m - t h B- p h r a s e (N~Z>m~i) and d e n o t e d by DS(£,m) or DS' (i,m).
i) If (i,J,ei,~j,c) 6 DS(£,m), then £~i>jAm.
2) For a r b i t r a r y i(ZAi>m), there exists unique j , a i , e j , c such that
(i,j,ai,ej,c) ~ DS(Z,m).
(Uniqueness of Dependent) 3) If (i,j,a~,a~,c) 6 DS(£,m) and
, , ~ o
(j,k,~j,~k,C) % DS(Z,m), then ~ = ~
~ , O J "
(Uniqueness of B - p h r a s e structure) , 4 ) If (i,J,~i,~j,c) ~ DS(£,m), (i ,j,~f ,~j, ,c ) E DS(£,m) and i>i'>j, then j,hj.
(Nest S t r u c t u r e of Dependency)
5) If ( i , j , @ i , ~ , c ) e DS(£,m) (i',j,a i, ,~j,c') ~ D~(£,m) and ~ i ' , then c ~ c'.
(Inhibition of D u p l i c a t i o n of a Case) The set of all d e p e n d e n c y s t r u c t u r ~ from i-th B - p h r a s e to m - t h B - p h r a s e is d e n o t e d by ~(~,m). A n y D S ( N , i ) ~ ~(N,i) is c a l l e d a d e p e n d e n c y s t r u c t u r e of the input sentence. The d e p e n d e n c y infor- m a t i o n of j-th B - p h r a s e is unique in DS(i,m), since 2) and 3) hold. Let JDiDS(Z,m) and jGDS(Z,m) be the depen- dency i n f o r m a t i o n of the j-th B - p h r a s e i n D ~ £ , m ) a n d ~ s e t of all the depen- d e n c y r e l a t i o n s that the j-th B - p h r a s e governs in DS(£,m), respectively.
def ~ i , ~ , C )
JGDS(i,m) u__. {c I ( i , J ~ D s ( ~ m ) }
Definition: If the k-th B - p h r a s e (i~k~_m) in DS(£,m) has the f o l l o w i n g property, k ( t h e k - t h B-phrase) is called a joint of DS(£,m):
For any ( i , j , a i , ~ j , c ) ~ DS(~,m) , k~i or J~k.
Let j~(=£) > j, > Jl > "'" > j (=m) be
u ,
the d e s c e n d l n g s e q u e n c e of ale the joints of DS(i,m) (see figure &). Then, the Jk-th B - p h r a s e is c a l l e d the k - t h joint of DS(£,m). There is a d e p e n d e n c y r e l a t i o n from k-th joint
(dependent) to k + i - t h joint(governor) in DS(£,m). Let J.DS(£,m) be a set of all the joints of DS(£,m). DS(£,m/i,j) a subset of DS(Z,m), is d e f i n e d as follows:
D S ( £ , m / i , j ) - ~ { (p,q,av,~o,c) I
(p,q,ap,aq,C) ~ DS(~,my, i~p>q~j}. Lemma i. For any p o s i t i v e i n t e g e r £, i, j, m (N~£~i>j~m), the f o l l o w i n g p r o p o s i t i o n s hold.
(i) D S ( i , m / i , j ) 6 ~ ( i , j ) , if j is a joint of DS(i,m).
(ii) DS(I,j) U D S ( j , m ) ~ ~ ( £ , m ) , if and only if JDiDS(Z,j) = JDiDS(j,m) . (iii) { (Z+i,j ,~, ~,c) }uDS (~,~) 6 ~ ( Z + i ,
m) if and only if (i+l,j,e,8,c) E D F , 8 = J D i D S ( Z , m ) , j E J.DS(£,m) and c ~ j G D S ( Z , m ) .
(iv) If (jk,Jk+1,ak,ek+1,c) ~ DS(j ,m) (k=0,1,2,-..), then Jk is the k- th joint of DS(J0,m).
S y n t a x a n a l y s i s of a J a p a n e s e
s e n t e n c e is d e f i n e d as giving B - p h r a s e s t r u c t u r e s and d e p e n d e n c y s t r u c t u r e of the sentence.
2. THE P A R S I N G A L G O R I T H M AND ITS E F F I C I E N C Y
In this chapter, we shall give a p a r s i n g m e t h o d w h i c h will p a r s e an input sentence using time O(n ~) and
space O(n~), w h e r e n is the length of input sentence. Moreover, if the d e p e n d e n c y i n f o r m a t i o n of each B - p h r a s e is unambiguous, the time v a r i a t i o n is quadratic.
The e s s e n c e of the p a r s i n g a l g o r i t h m is t h e c o n s t r u c t i o n of B - p h r a s e parse list BL and d e p e n d e n c y p a r s e list DL w h i c h are c o n s t r u c t e d e s s e n t i a l l y by a
"dynamic p r o g r a m m i n g " method. The p a r s i n g a l g o r i t h m c o n s i s t s of four m i n o r a l g o r i t h m s that are the c o n s t r u c - tion of BL, the o b t a i n i n g of B - p h r a s e structure, the c o n s t r u c t i o n of DL and the o b t a i n i n g of d e p e n d e n c y structure.
13-PHRASE PARSE LIST
Let b be a string of n length and b(i) denote the i-th c h a r a c t e r from the left end of it.
b = b ( 1 ) (2) ... b(n). The B - p h r a s e p a r s e list of b
c o n s i s t s of n m i n o r lists BL(1), BL(2), • .. , BL(n).
[ ] F o r m of items in BL(j) (i, WS, DI)
where, IL_i < j ~ n , WS is a w o r d s t r u c t u r e and DI is a d e p e n d e n c y information.
[ ] S e m a n t i c s of (i, WS, DI)EBL(j) (i, WS, DI)E BL(j), if and only if there exists a s e q u e n c e of w o r d s w o, w l, ... , w£ s a t i s f y i n g
following two c o n d i t i o n s :
i) b(1)b(2) ... b ( i ) = w 0 w I .. • w~_ I, b(i+l)b(i+2) ... b(j)=w£, and WS is the w o r d s t r u c t u r e of w£. 2) The string of w o r d w _ w I ... w Z is
• D
a left p a r c l a l B - p h r a s e of depen- d e n c y i n f o r m a t i o n DI.
A L G O R I T H M F O R THE C O N S T R U C T I O N OF BL
Input.
An input string b=b(1)(2)• • . b ( n ) .
Output.
The B - p h r a s e parse list BL(1), BL(2), ... , BL(n).Method.
Step i: Find all the i n d e p e n d e n t w o r d w h i c h are the left- m o s t subwords of b, u s i n g i n d e p e n d e n t w o r d d i c t i o n a r y and for each indepen- dent w o r d w=b(1)b(2) ''. b(j), add(0, (W,H,K),a) to BL(j) where, (W,H,K) is the w o r d s t r u c t u r e of w and ~ =
(<W,H>K, <W,H,-> d) . Then, set the c o n t r o I w o r d i to 1 and repeat Step 2 until ~ = n •
Step 2: O b t a i n all the suffix words w h i c h are the l e f t - m o s t s u b w o r d s of B(i+l)B(i+2) ... b(n) and for each suffix w o r d w = b ( i + l ) b ( i + 2 ) ... b(k) of w o r d s t r u c t u r e (W' ,H' ,K') , and for each item (j, <W,H,K>,a) # BL(i), add
C ( H , K , K ' ) . (W',H')0a is a d e p e n d e n c y i n f o r m a t i o n d e f i n e d as follows.
i If H' is a a u x i l i a r y verb• then (W',H')o~ d e f (<~>g,<W,,H,,_>d) where• <a>g is the i n f o r m a t i o n of g o v e r n o r or a.
ii Let <W",H",H"' > be the i n f o r m a - t i o n of d e p e n d e n t of ~. W h e n H' is a particle,
(W,,H,)o a def
. . . . (<a>g,<W",H",H'>d)
if the p o w e r of d e p e n d e n c y f u n c t i o n of H' is s t r o n g e r than that of H"' , and e l s e
(W,,H,)o ~ def ~.
T h e r e e x i s t s u p p e r limit in the l e n g t h of w o r d s a n d there e x i s t s u p p e r limit in the n u m b e r of d e p e n d e n c y i n f o r m a t i o n s of all left p a r t i a l B- p h r a s e of a(1)a(2) ... a(i). Therefore, there exists u p p e r limit for the
n e c e s s a r y size of m e m o r y space of BL(i) and the t h e o r e m 1 follows.
T h e o r e m i.
A l g o r i t h m for the c o n s t r u c t i o n of BL r e q u i r e s O(n) m e m o r y space and O(n) e l e m e n t a r y o p e r a t i o n s .
We shall n o w d e s c r i b e h o w to find a B - p h r a s e s t r u c t u r e of s p e c i f i e d d e p e n - d e n c y i n f o r m a t i o n from BL. The m e t h o d is g i v e n as follows.
A L G O R I T H M F O R O B T A I N I N G A B - P H R A S E S T R U C T U R E OF A N I N P U T S T R I N G
Input.
The s p e c i f i e d d e p e n d e n c y i n f o r m a t i o n ~ and BL.Output.
A B - p h r a s e s t r u c t u r e of d e p e n d e n c y i n f o r m a t i o n a or the e r r o r signal "error".Method.
STEP i: S e a r c h any i t e m (i,(W,H,K),a) in BL(n) such as Termi (H,H). If there is no such item, then emit "error" and halt. O t h e r w i s e , o u t p u t the w o r d s t r u c t u r e (W,H,K), set the r e g i s t e r R to (i,(W,H,K),a) and r e p e a t the step 2 until i = 0.STEp 2: Let R be (i,(W,H,K),e). S e a r c h any i t e m (i',(W',H',K'),a') in BL(i) such as C(H',K',H) and (W,H) o~=a. There e x i s t at least one e l e m e n t w h i c h s a t i s f i e s a b o v e c o n d i t i o n s . "Output the w o r d s t r u c t u r e (W',H',K') and R ÷ ( i ' , ( W ' , H ' , K ' ) , a ' ) .
It is e a s y to k n o w t h e o r e m 2 holds. T h e o r e m 2.
A B - p h r a s e s t r u c t u r e of s p e c i f i e d d e p e n d e n c y i n f o r m a t i o n is o u t p u t by the above a l g o r i t h m , if a n d o n l y if the input s t r i n g has at least one B - p h r a s e s t r u c t u r e of s p e c i f i e d d e p e n d e n c y i n f o r m a t i o n and it takes c o n s t a n t m e m o r y space and O(n) e l e m e n t a r y o p e r a t i o n s to o p e r a t e the above
algorithm.
The set of all the d e p e n d e n c y i n f o r m a t i o n s DI of input s t r i n g b is o b t a i n e d from BL(n), since
DI={a I (i, (W,H,K) ,a)£SL(n) , C(H,K) }. D E P E N D E N C Y P A R S E L I S T DL
Let s be a i n p u t s e n t e n c e of N B- phrases. The set of all the d e p e n - dency i n f o r m a t i o n s DI(i) of the i-th B - p h r a s e is o b t a i n e d by o p e r a t i n g the a l g o r i t h m of c o n s t r u c t i o n of BL on the string of the i-th B - p h r a s e .
T h e d e p e n d e n c y p a r s e list DL of s c o n s i s t s of N-i m i n o r lists DL(2), DL(3) , ''- ,DL(N) .
[ ] F o r m of items in DL(i). (ai,J,aj,~,P) I (ai•J,aj, ,P)
where, N ~ i > j ~ l , a i e DI(i), a j E DI(j), c e ~, P ~ and $ is a s p e c i a l l y intro- d u c e d symbol.
I ~ S e m a n t i c s of (ai,J,aj,c,P)6DL(i). (ai,J,aj,c,P) ~ DL]i) • if and o n l y if there is a d e p e n d e n c y struc- ture DS(i,i) of s, w h e r e
(i,J,ai,a~,c) ~ DS(i,i), jGDS (i,l) < P.
~ S e m a n t i c s of ( a i • j • ~ , S , P ) 6 D L ( i ) . (ai•J,?j ,$,P) e Dn(1), if and o n l y if there is a d e p e n d e n c y s t r u c t u r e DS(i,i) of s, w h e r e
a i = i D i D S ( i i) • r a. :JDiDS(i,i), J j is a joint of DS(i,i) e x c e p t
O - t h or 1st joint, jGDS(i,i) = P .
A L G O R I T H M F O R THE C O N S T R U C T I O N OF DL
Input.
The s e q u e n c e of the sets of all d e p e n d e n c y i n f o r m a t i o n s DI(1) , DI(2) , ''" •DI(N) .Output.
D e p e n d e n c y list DL(2), DL(3) , ''" ,DL(N).Method.
STEP 1 ( C o n s t r u c t i o n of DL(2))~ For e a c h a e DI(2)• a 1 6 DI(1) and c E ~ such that ~ e6(c~2,c~i) , add(a2,l,al,c,{c}) to DL(2)• set i to 2 and r e p e a t the STEP 2 and the STEP 3 u n t i l i = N.
STEP 2 ( R e g i s t r a t i o n of items of the form (ai+l,j,aA,c,P)) : For any
(ai,J,aj•c,P) ~ DL(i) and ~ i + 1 6 DI(i+l) , c o m p u t e 6 (ai+ I,~i) and add e v e r y
( a i + l , i , a i , c ' , { c ' } ) to DL(i+i) such that c ' 6 6(ei+1,~i). And, for any
(c~i,J,aj,A,P) 6 DL(i) w h e r e A ~ ~ ~'{$} and a i + 1 £ DI(i+l), c o m p u t e ~(c~i+1,aA) and add e v e r y (ai+l,j,c~j,c',PU {c'}~ to DL(i+i) such that c ' 6 ~ ( a i + 1 , a j) and c ' } P. Go to Step 3.
STEP 3 (Registration of items of
the form (ai+1,j,ej,$,P)): For any
(ai+1,j,al,c,P) ~ DL(i+i) and (al,k,ek, A,P') # DL~j), add (ei+1,k,ak,$,~') to
DL(i+i). Then, set i to i+; and go
to STEP 2. T h e o r e m 3.
If there exist no a m b i g u i t y in the d e p e n d e n c y i n f o r m a t i o n of B - p h r a s e s of input sentence, then the step 3 in the above a l g o r i t h m can be r e p l a c e d to the f o l l o w i n g step 3'.
STEP 3': For e a c h (~Ki+!,j,~A,A,P)
6 DL(Ki+~), add (~i+~,j,aj,~,P) £o DL(i+i), where
de----~ m a x { k I (ai+l k ~k,C,P)
Ki+l , ,
D L (
i+l )}.
Then, set i to i+l and go to STEP 2.
The e f f i c i e n c y of each step of above a l g o r i t h m is as follows.
The m e m o r y size of DL(i) is O(N). The step i, the step 2 and the step 3 take constant, O(N) and O(N ~)
e l e m e n t a r y operations, respectively. The step 3' takes O(N) e l e m e n t a r y o p e r a t i o n s since it takes O(N)
e l e m e n t a r y o p e r a t i o n s to c o m p u t e Ki+ ~ .
Therefore, the t h e o r e m 4 holds. T h e o r e m 4.
The a l g o r i t h m for the c o n s t r u c t i o n of DL requires O(N ~) m e m o r y space and
O(N ~) e l e m e n t a r y operations. Moreover,
if there exist no a m b i g u i t y in the d e p e n d e n c y i n f o r m a t i o n of each B- phrases, the a l g o r i t h m requires O(N ~) e l e m e n t a r y o p e r a t i o n s b y r e p l a c i n g the step 3 w i t h the step 3'
We shall now d e s c r i b e how to find a d e p e n d e n c y s t r u c t u r e of input s e n t e n c e
from DL. To b e g i n with, we shall
e x p l a i n items of p a r t i a l d e p e n d e n c y structure list PDSL.
Form of items in PDSL (i,j,a~,a~,P#)
where, N h i ~ j ~ i a# ~ DI(i) ~ {#},
~ % DI(j) U { ~ , P~ i~ a subset of C or # O a n d # is s p e c i a l l y i n t r o d u c e d symbol.
~ S e m a n t i c s of (i,j,~#,e#.p#)
. ~ i j-
The item (i,j,a~,e~,P#) % PDSL means to be a dependenceS- s t r u c t u r e D S ( i , j ) ~ ~(i,j) such that f o l l o w i n g c o n d i t i o n s i),2) and 3) hold.
i) If a~=~i(%#), then iDiDS (i,j) =e i •
2) If e#=aj(~#!,~ then JDiDS(i,j)=aj.
3) If P~=P(~#). then JGDS(i,j)=P.
Therefore, (N,i,#,#,#) means to be a d e p e n d e n c y structure of the input sentence.
A L G O R I T H M F O R O B T A I N I N G A D E P E N D E N C Y S T R U C T U R E F R O M DL
Input.
DL.Output.
A d e p e n d e n c y structure of input s e n t e n c e or the signal "error".Method.
STEP i: If DL(N) is empty, emit the m e s s a g e "error", else,i n i t i a l i z e PDSL to {(N,i,#,#,#)} and r e p e a t step 2 until PDSL b e c o m e s empty.
STEP 2: Take an item freely out of
PDSL and d e l e t e it from PDSL. A c c o r d -
ing to the form of the item, e x e c u t e i) or 2) or 3).
i) If the item is (N,i,#,#,#) of
the form and (aN,J,ej,c,P) ~ DL(N) , then output (N,J,eN,ej,c), add (N-i,j, #,aj,P/{c}) to PDSL i~ N-i ~ j and add
(j,l,aj,#,#) to PDSL if j ~ i.
2) If the item is (i,l,ei,#,#) of
the form and ( j , ~ i , e j , c , P ) E DL(i), then o u t p u t (i,J,ei,e~,c), add (i-l,j, #,aj,P/{c}) to PDSL i~ i-i @ j and add
(j,l,a~,#,#) to PDSL if j @ 1
3) aIf the item is ( i , j , ~ , e ~ , P ) of the form, w h e r e ~#=~= or #, anda(ai,j, ~ , c , P ) E DL(i), t h e n ± o u t p u t (i,J,~i, e~,c) and add (i-l,j,#,~j,P/{c}) to
PDSL if i-i % j. When there is not
s u c h item in DL(i), s e a r c h a p a i r of items (ei,k,ak,C,P') E DL(i) and (ak,J, ej,A,P) ~ DL(k), then o u t p u t (i,k,ei,ak, cy , add (i-l,k,#,~k,P'/{c}) to PDSL if i-i @ k and add (k,j,~k,ej,P) to PDSL.
PDSL needs O(N) m e m o r y space and STEP i, STEP 2 take constant, O(N) e l e m e n t a r y operations, respectively.
T h e o r e m 5.
A-igorithm for o b t a i n i n g a d e p e n d e n c y structure from DL r e q u i r e s O(N) m e m o r y space and O(N 2) e l e m e n t a r y operations.
P A R S I N G A L G O R I T H M
Input.
A J a p a n e s e sentence in collo- quial style.Output.
A d e p e n d e n c y structure DS(N, i) of the input s e n t e n c e and a B - p h r a s e structure of the j-th B-phrase, w h o s e d e p e n d e n c y i n f o r m a t i o n is JDiDS(N,i), for e v e r y j(j=l,2, "'" ,N).Method.
STEP i: C o n s t r u c t N B - p h r a s e parse lists of all B - p h r a s e s of the input sentence and get the sets of d e p e n d e n c y i n f o r m a t i o n s DI(1), DI(2),• ." , D I ( N ) .
STEP 2: C o n s t r u c t d e p e n d e n c y parse
list DPL from DI(1), DI(2), ... ,DI(N).
STEP 3: O b t a i n a d e p e n d e n c y
structure DS(N,i) of the input s e n t e n c e from DL.
STEP 4: O b t a i n a B - p h r a s e structure
Let n~ be the length of j-th B- phrase (~=i,2, •-- ,N), and N , n d e n o t e the n u m b e r of B - p h r a s e s and the length of input sentence, respectively. Then,
n1+n2+ ... +n N = n N L n
By t h e o r e m i, t h e o r e m 2, t h e o r e m 4 and t h e o r e m 5, next t h e o r e m holds.
T h e o r e m 6.
The p a r s i n g a l g o r i t h m r e q u i r e s
O(n 2) m e m o r y space and O(n 3) e l e m e n t a r y operations. Moreover, if the d e p e n d e n - cy i n f o r m a t i o n of each B - p h r a s e is unambiguous, it r e q u i r e s O(n 2) elemen- tary operations.
3. C O N C L U S I O N
Syntax of J a p a n e s e s e n t e n c e s is stated and a e f f i c i e n t p a r s i n g
a l g o r i t h m is given. A J a p a n e s e sen- tence in c o l l o q u i a l style is p a r s e d b Y the p a r s i n g algorithm, u s i n g time O(n ~) a n d m e m o r y space O(n2), w h e r e n is the length of input sentence. Moreover, it is p a r s e d u s i n g time O(n 2) w h e n e v e r d e p e n d e n c y i n f o r m a t i o n of every B- p h r a s e is unambiguous.
R E F E R E N C E S
i. Aho, U l l m a n : "The T h e o r y of Parsing, T r a n s l a t i o n , and Compil- ing", P r e n t i c e Hall vol. 1 (1975). 2. Woods : " T r a n s i t i o n N e t w o r k
G r a m m a r s for N a t u r a l L a n g u a g e A n a l y s i s " , C o m m u n i c a t i o n of the ACM, 13 (1970).
3. Pratt : "LINGOL - - A P r o g r e s s Report", Proc. IJCAI 4 (1975).
(~)
(~s)
(s)
(2)
0-)
J0 =5) Jl (=4) J2 (=3) J3 (=i)
: main part a: agent
- - : annex part p: patient
J0,Jl,J~,J3 : the sequence of joint
Figure 2. D e p e n d e n c y S t r u c t u r e
Example: Taro read the c o m p o s i t i o n w r i t t e n by Hanako.
Figure i. Ways of S p a c i n g
[image:6.596.325.550.123.251.2]