• No results found

A Syntax Parser Based on the Case Dependency Grammar and Its Efficiency

N/A
N/A
Protected

Academic year: 2020

Share "A Syntax Parser Based on the Case Dependency Grammar and Its Efficiency"

Copied!
6
0
0

Loading.... (view fulltext now)

Full text

(1)

A S Y N T A X P A R S E R B A S E D ON THE C A S E D E P E N D E N C Y G R A M M A R A N D ITS E F F I C I E N C Y

T o r u H i t a k a and Sho Y o s h i d a

D e p a r t m e n t of E l e c t r o n i c s , K y u s h u U n i v e r s i t y , Fukuoka, J a p a n

S U M M A R Y

A u g u m e n t e d t r a n s i t i o n n e t w o r k

g r a m m a r s (ATNGs) or a u g u m e n t e d c o n t e x t - free g r a m m a r s are g e n e r a l l y u s e d in n a t u r a l l a n g u a g e p r o c e s s i n g systems. The a d v a n t a g e s of A T N G s m a y be summa- r i z e d as i) e f f i c i e n c y of r e p r e s e n t a - tion, 2) p e r s p i c u i t y , 3) g e n e r a t i v e power, and the d i s a d v a n t a g e of A T N G s is t h a t it is d i f f i c u l t to get an effi- c i e n t p a r s i n g a l g o r i t h m b e c u a s e of the f l e x i b i l i t y of their c o m p l i c a t e d

a d d i t i o n a l functions.

In this paper, the s y n t a x of J a p a n e s e s e n t e n c e s , b a s e d on case d e p e n d e n c y r e l a t i o n s are s t a t e d first, a n d t h e n we give an b o t t o m - u p and b r e a d t h - f i r s t p a r s i n g a l g o r i t b x n w h i c h p a r s e s input s e n t e n c e u s i n g time O(n 3) and m e m o r y space O(n2), w h e r e n is the l e n g t h of input sentence. M o r e o v e r , it is shown t h a t this p a r s e r r e q u i r e s time O(n2), w h e n e v e r e a c h B - p h r a s e in input s e n t e n c e is u n a m b i g u o u s in its g r a m m a t i c a l structure. T h e r e f o r e , the e f f i c i e n c y of this p a r s e r is n e a r l y equal to the E a r l e y ' s p a r s e r w h i c h is the m o s t e f f i c i e n t p a r s i n g m e t h o d for g e n e r a l c o n t e x t - f r e e g r a m m a r s .

1. F U N D A M E N T A L S OF J A P A N E S E S E N T E N C E T h e J a p a n e s e s e n t e n c e is o r d i n a r i l y w r i t t e n in k a n a (phonetic) l e t t e r s a n d k a n j i (ideographic) c h a r a c t e r s w i t h o u t l e a v i n g a space b e t w e e n words. F r o m the v i e w p o i n t of m a c h i n e p r o c e s s i n g , however, it is n e c e s s a r y to e x p r e s s c l e a r l y the units c o m p o s i n g the s e n t e n c e in such a w a y as to leave a space b e t w e e n e v e r y w o r d as in English. We have no s t a n d a r d w a y of s p a c i n g the units t h o u g h the n e e d for this has b e e n d e m a n d e d for a long time.

We give some e x a m p l e s in F i g u r e i. T h e first s e n t e n c e in the figure is of o r d i n a r y w r i t t e n form.

The s e c o n d i n d i c a t e s a w a y of s p a c i n g (i.e. p u t t i n g a space b e t w e e n e v e r y word).

The t h i r d i n d i c a t e s a n o t h e r w a y of s p a c i n g (i.e. p u t t i n g a space b e t w e e n e v e r y B - p h r a s e ) .

N o w a d a y s , m a n y o t h e r s p a c i n g m e t h o d s have b e e n t r i e d in s e v e r a l i n s t i t u t e s in Japan.

In this paper, input s e n t e n c e s are g i v e n in c o l l o q u i a l style in w h i c h a s p a c i n g symbol is p l a c e d b e t w e e n two s u c c e s s i v e B - p h r a s e s .

In J a p a n e s e sentences,

BUNSETSUs(B-

phrase) are the m i n i m a l m o r p h o l o g i c a l units of case d e p e n d e n c y , and the s y n t a x of J a p a n e s e s e n t e n c e s c o n s i s t s of (i) the s y n t a x of B - p h r a s e as a s t r i n g of words, and (2) the s y n t a x of a s e n t e n c e as a s t r i n g of B - p h r a s e s .

A B - p h r a s e u s u a l l y p r o n o u n c e d w i t h o u t p a u s i n g c o n s i s t s of t w o parts - - m a i n p a r t [or e q u a l l y an i n d e p e n d e n t p a r t in the c o n v e n t i o n a l school g r a m m a - tical term] and an a n n e x p a r t w h i c h is p o s t p o s i t i o n e d . We d e n o t e the c o n n e c - tion of two p a r t s in a B - p h r a s e by a dot if n e c e s s a r y . A m a i n part, w h i c h is a c o n c e p t u a l w o r d [or e q u a l l y an inde- p e n d e n t word] (e.g. noun, verb,

a d j e c t i v e or adverb) p r o v i d e s m a i n l y the i n f o r m a t i o n of the concept. On the o t h e r hand, an a n n e x part, a p o s s i b l y null string of s u f f i x w o r d s (e.g. auxi- liary verbs or particles) p r o v i d e s the i n f o r m a t i o n c o n c e r n i n g the k a k a r i u k e r e l a t i o n a n d / o r the s u p p l e m e n t a r y

i n f o r m a t i o n (e.g. the s p e a k e r ' s a t t i t u d e t o w a r d s the c o n t e n t s of the sentence, tense, etc.)

A w o r d w has it's s p e l l i n g W, p a r t of s p e e c h H and i n f l e x i o n K. We call

(W,H,K) the w o r d s t r u c t u r e of w.

S u p p o s e t h a t a string b of l e n g t h n be a B - p h r a s e . Then, there e x i s t an i n d e p e n d e n t w o r d w 0 a n d s u f f i x w o r d s Wl, w z, ... , w~, and

b = w 0 w I . • • w~

(2)

B - p h r a s e s and Termi(HQ,Kz) m e a n s a w o r d w h o s e part of speech ~nd i n f l e x i o n are H£, KZ r e s p e c t i v e l y can be a r i g h t - m o s t s u b w o r d of B-phrases.

(i), (2) are c a l l e d the rules of B- phrase structure, and

(W0,H0,K 0) (Wi,HI,K ~) "''(Wz,H~,K ~) • . . ( 3 ) is c a l l e d B - p h r a s e s t r u c t u r e of b. If

(3) satisfies the c o n d i t i o n (i), w 0 w l w

• . . w Z is c a l l e d to be a left p a r t i a l 2

B-phrase.

The kakariuke r e l a t i o n is the depen- dency r e l a t i o n b e t w e e n two B - p h r a s e s in a sentence. A B - p h r a s e has the syn- tactic functions of g o v e r n o r and

dependent. The f u n c t i o n of g o v e r n o r is m a i n l y r e p r e s e n t e d by the i n d e p e n d e n t w o r d of B-phrase. The f u n c t i o n of d e p e n d e n t is m a i n l y r e p r e s e n t e d by the string of p a r t i c l e s w h i c h is the right- m o s t s u b s t r i n g of B - p h r a s e and by the w o r d in front of it (right-most non- p a r t i c l e word).

E v e r y p a r t i c l e has the s y n t a c t i c and p a r t i a l l y s e m a n t i c d e p e n d e n t function w i t h its own degree of power. The p a r t i c l e w h o s e p o w e r of d e p e n d e n t f u n c t i o n is s t r o n g e s t of all p a r t i c l e s a p p e a r i n g in the string of p a r t i c l e s is c a l l e d the r e p r e s e n t a t i v e particle. Therefore, the syntactic f u n c t i o n of d e p e n d e n t of a B - p h r a s e is m a i n l y r e p r e s e n t e d by the r e p r e s e n t a t i v e p a r t i c l e and by the r i g h t - m o s t non- p a r t i c l e word.

Let (W0,H0,K0) , (Wi,Hi,Ki) , (W~,H~,K~) be the w o r d s t r u c t u r e s of i n d e p e n d e n t J word, r i g h t - m o s t n o n - p a r t i c l e w o r d and r e p r e s e n t a t i v e p a r t i c l e of a B-phrase, respectively. Then, <W^,H^>_, <W..,Hi, u u ~ & Hj> d are c a l l e d the i n r o r m a t l o n or g o v e r n o r and the i n f o r m a t i o n of depen- dent of the B - p h r a s e r e s p e c t i v e l y , and the pair (<W0,H0>~,<Wi,Hi,Hj>d) is c a l l e d d e p e n d e n c y ~ i n f o r m a t i 6 n of the B-phrase.

There are m a n y types of d e p e n d e n c y r e l a t i o n such as agent, patient, instrument, location, time, etc. Let C be the set of all types of d e p e n d e n c y relation. The set of all p o s s i b l e d e p e n d e n c y r e l a t i o n s from a B - p h r a s e b l to a B - p h r a s e b 2 is founded on the i n f o r m a t i o n of d e p e n d e n t of b I and the i n f o r m a t i o n of g o v e r n o r of b 2. There- fore, there is a f u n c t i o n 6 w h i c h com- putes the set of all p o s s i b l e d e p e n d e n - cy r e l a t i o n s ~(a,8) b e t w e e n a B - p h r a s e of d e p e n d e n c y i n f o r m a t i o n ~ and a n o t h e r B - p h r a s e of d e p e n d e n c y i n f o r m a t i o n 8. The f u n c t i o n ~ is r e a l i z e d by the d e p e n d e n c y d i c t i o n a r y r e t r i e v e d w i t h the key of two d e p e n d e n c y informations.

The o r d e r of B - p h r a s e is r e l a t i v e l y free in a simple sentence, except for one c o n s t r a i n t that the p r e d i c a t i v e B - p h r a s e g o v e r n i n g the w h o l e sentence must be in the s e n t e n c e ' s final posi- tion. J a p a n e s e is a post p o s i t i o n a l in this sense.

The p a t t e r n of the d e p e n d e n c y r e l a t i o n s in a s e n t e n c e has some

s t r u c t u r a l p r o p e r t y w h i c h is c a l l e d the rules of d e p e n d e n c y structure, and the d e p e n d e n c y r e l a t i o n s in a s e n t e n c e are c a l l e d the d e p e n d e n c y s t r u c t u r e of a sentence. The d e p e n d e n c y s t r u c t u r e of a s e n t e n c e is shown in figure 2, w h e r e arrows i n d i c a t e d e p e n d e n c y rela- tions of v a r i o u s types. The rules of d e p e n d e n c y structure c o n s i s t of follow- ing three conditions.

i Each B - p h r a s e e x c e p t one at the s e n t e n c e final is a d e p e n d e n t of e x a c t l y one B - p h r a s e a p p e a r i n g after it.

ii A d e p e n d e n c y r e l a t i o n b e t w e e n any two B - p h r a s e s does not cross w i t h a n o t h e r d e p e n d e n c y r e l a t i o n s in a sentence.

iii No two d e p e n d e n c y r e l a t i o n s d e p e n d i n g on the same g o v e r n o r are the same.

Let N be the number of B - p h r a s e s in a input sentence, and all B - p h r a s e s are n u m b e r e d d e s c e n d i n g l y from right to left (see figure 2). We shall fix an input sentence, t h r o u g h o u t this chapter. Let DI(i) be the set of all d e p e n d e n c y i n f o r m a t i o n s of i-th B-phrase.

Definition: A d e p e n d e n c y file DF of a s e n t e n c e is a finite set of 5-tuples.

(i,j,ai,ej,c) 6 DF

.... _ ~ { N=i>j=l, a i E DI (i), cde . ~j 6 DI(j) and c E ~ ( a i , a j ) . Definition: If a subset of DF

s a t i s f i e s f o l l o w i n g c o n d i t i o n s i) to 5), it is c a l l e d a d e p e n d e n c y s t r u c t u r e from the Z-th B - p h r a s e to the m - t h B- p h r a s e (N~Z>m~i) and d e n o t e d by DS(£,m) or DS' (i,m).

i) If (i,J,ei,~j,c) 6 DS(£,m), then £~i>jAm.

2) For a r b i t r a r y i(ZAi>m), there exists unique j , a i , e j , c such that

(i,j,ai,ej,c) ~ DS(Z,m).

(Uniqueness of Dependent) 3) If (i,j,a~,a~,c) 6 DS(£,m) and

, , ~ o

(j,k,~j,~k,C) % DS(Z,m), then ~ = ~

~ , O J "

(Uniqueness of B - p h r a s e structure) , 4 ) If (i,J,~i,~j,c) ~ DS(£,m), (i ,j,~f ,~j, ,c ) E DS(£,m) and i>i'>j, then j,hj.

(Nest S t r u c t u r e of Dependency)

(3)

5) If ( i , j , @ i , ~ , c ) e DS(£,m) (i',j,a i, ,~j,c') ~ D~(£,m) and ~ i ' , then c ~ c'.

(Inhibition of D u p l i c a t i o n of a Case) The set of all d e p e n d e n c y s t r u c t u r ~ from i-th B - p h r a s e to m - t h B - p h r a s e is d e n o t e d by ~(~,m). A n y D S ( N , i ) ~ ~(N,i) is c a l l e d a d e p e n d e n c y s t r u c t u r e of the input sentence. The d e p e n d e n c y infor- m a t i o n of j-th B - p h r a s e is unique in DS(i,m), since 2) and 3) hold. Let JDiDS(Z,m) and jGDS(Z,m) be the depen- dency i n f o r m a t i o n of the j-th B - p h r a s e i n D ~ £ , m ) a n d ~ s e t of all the depen- d e n c y r e l a t i o n s that the j-th B - p h r a s e governs in DS(£,m), respectively.

def ~ i , ~ , C )

JGDS(i,m) u__. {c I ( i , J ~ D s ( ~ m ) }

Definition: If the k-th B - p h r a s e (i~k~_m) in DS(£,m) has the f o l l o w i n g property, k ( t h e k - t h B-phrase) is called a joint of DS(£,m):

For any ( i , j , a i , ~ j , c ) ~ DS(~,m) , k~i or J~k.

Let j~(=£) > j, > Jl > "'" > j (=m) be

u ,

the d e s c e n d l n g s e q u e n c e of ale the joints of DS(i,m) (see figure &). Then, the Jk-th B - p h r a s e is c a l l e d the k - t h joint of DS(£,m). There is a d e p e n d e n c y r e l a t i o n from k-th joint

(dependent) to k + i - t h joint(governor) in DS(£,m). Let J.DS(£,m) be a set of all the joints of DS(£,m). DS(£,m/i,j) a subset of DS(Z,m), is d e f i n e d as follows:

D S ( £ , m / i , j ) - ~ { (p,q,av,~o,c) I

(p,q,ap,aq,C) ~ DS(~,my, i~p>q~j}. Lemma i. For any p o s i t i v e i n t e g e r £, i, j, m (N~£~i>j~m), the f o l l o w i n g p r o p o s i t i o n s hold.

(i) D S ( i , m / i , j ) 6 ~ ( i , j ) , if j is a joint of DS(i,m).

(ii) DS(I,j) U D S ( j , m ) ~ ~ ( £ , m ) , if and only if JDiDS(Z,j) = JDiDS(j,m) . (iii) { (Z+i,j ,~, ~,c) }uDS (~,~) 6 ~ ( Z + i ,

m) if and only if (i+l,j,e,8,c) E D F , 8 = J D i D S ( Z , m ) , j E J.DS(£,m) and c ~ j G D S ( Z , m ) .

(iv) If (jk,Jk+1,ak,ek+1,c) ~ DS(j ,m) (k=0,1,2,-..), then Jk is the k- th joint of DS(J0,m).

S y n t a x a n a l y s i s of a J a p a n e s e

s e n t e n c e is d e f i n e d as giving B - p h r a s e s t r u c t u r e s and d e p e n d e n c y s t r u c t u r e of the sentence.

2. THE P A R S I N G A L G O R I T H M AND ITS E F F I C I E N C Y

In this chapter, we shall give a p a r s i n g m e t h o d w h i c h will p a r s e an input sentence using time O(n ~) and

space O(n~), w h e r e n is the length of input sentence. Moreover, if the d e p e n d e n c y i n f o r m a t i o n of each B - p h r a s e is unambiguous, the time v a r i a t i o n is quadratic.

The e s s e n c e of the p a r s i n g a l g o r i t h m is t h e c o n s t r u c t i o n of B - p h r a s e parse list BL and d e p e n d e n c y p a r s e list DL w h i c h are c o n s t r u c t e d e s s e n t i a l l y by a

"dynamic p r o g r a m m i n g " method. The p a r s i n g a l g o r i t h m c o n s i s t s of four m i n o r a l g o r i t h m s that are the c o n s t r u c - tion of BL, the o b t a i n i n g of B - p h r a s e structure, the c o n s t r u c t i o n of DL and the o b t a i n i n g of d e p e n d e n c y structure.

13-PHRASE PARSE LIST

Let b be a string of n length and b(i) denote the i-th c h a r a c t e r from the left end of it.

b = b ( 1 ) (2) ... b(n). The B - p h r a s e p a r s e list of b

c o n s i s t s of n m i n o r lists BL(1), BL(2), • .. , BL(n).

[ ] F o r m of items in BL(j) (i, WS, DI)

where, IL_i < j ~ n , WS is a w o r d s t r u c t u r e and DI is a d e p e n d e n c y information.

[ ] S e m a n t i c s of (i, WS, DI)EBL(j) (i, WS, DI)E BL(j), if and only if there exists a s e q u e n c e of w o r d s w o, w l, ... , w£ s a t i s f y i n g

following two c o n d i t i o n s :

i) b(1)b(2) ... b ( i ) = w 0 w I .. • w~_ I, b(i+l)b(i+2) ... b(j)=w£, and WS is the w o r d s t r u c t u r e of w£. 2) The string of w o r d w _ w I ... w Z is

D

a left p a r c l a l B - p h r a s e of depen- d e n c y i n f o r m a t i o n DI.

A L G O R I T H M F O R THE C O N S T R U C T I O N OF BL

Input.

An input string b=b(1)(2)

• • . b ( n ) .

Output.

The B - p h r a s e parse list BL(1), BL(2), ... , BL(n).

Method.

Step i: Find all the i n d e p e n d e n t w o r d w h i c h are the left- m o s t subwords of b, u s i n g i n d e p e n d e n t w o r d d i c t i o n a r y and for each indepen- dent w o r d w=b(1)b(2) ''. b(j), add

(0, (W,H,K),a) to BL(j) where, (W,H,K) is the w o r d s t r u c t u r e of w and ~ =

(<W,H>K, <W,H,-> d) . Then, set the c o n t r o I w o r d i to 1 and repeat Step 2 until ~ = n •

Step 2: O b t a i n all the suffix words w h i c h are the l e f t - m o s t s u b w o r d s of B(i+l)B(i+2) ... b(n) and for each suffix w o r d w = b ( i + l ) b ( i + 2 ) ... b(k) of w o r d s t r u c t u r e (W' ,H' ,K') , and for each item (j, <W,H,K>,a) # BL(i), add

(4)

C ( H , K , K ' ) . (W',H')0a is a d e p e n d e n c y i n f o r m a t i o n d e f i n e d as follows.

i If H' is a a u x i l i a r y verb• then (W',H')o~ d e f (<~>g,<W,,H,,_>d) where• <a>g is the i n f o r m a t i o n of g o v e r n o r or a.

ii Let <W",H",H"' > be the i n f o r m a - t i o n of d e p e n d e n t of ~. W h e n H' is a particle,

(W,,H,)o a def

. . . . (<a>g,<W",H",H'>d)

if the p o w e r of d e p e n d e n c y f u n c t i o n of H' is s t r o n g e r than that of H"' , and e l s e

(W,,H,)o ~ def ~.

T h e r e e x i s t s u p p e r limit in the l e n g t h of w o r d s a n d there e x i s t s u p p e r limit in the n u m b e r of d e p e n d e n c y i n f o r m a t i o n s of all left p a r t i a l B- p h r a s e of a(1)a(2) ... a(i). Therefore, there exists u p p e r limit for the

n e c e s s a r y size of m e m o r y space of BL(i) and the t h e o r e m 1 follows.

T h e o r e m i.

A l g o r i t h m for the c o n s t r u c t i o n of BL r e q u i r e s O(n) m e m o r y space and O(n) e l e m e n t a r y o p e r a t i o n s .

We shall n o w d e s c r i b e h o w to find a B - p h r a s e s t r u c t u r e of s p e c i f i e d d e p e n - d e n c y i n f o r m a t i o n from BL. The m e t h o d is g i v e n as follows.

A L G O R I T H M F O R O B T A I N I N G A B - P H R A S E S T R U C T U R E OF A N I N P U T S T R I N G

Input.

The s p e c i f i e d d e p e n d e n c y i n f o r m a t i o n ~ and BL.

Output.

A B - p h r a s e s t r u c t u r e of d e p e n d e n c y i n f o r m a t i o n a or the e r r o r signal "error".

Method.

STEP i: S e a r c h any i t e m (i,(W,H,K),a) in BL(n) such as Termi (H,H). If there is no such item, then emit "error" and halt. O t h e r w i s e , o u t p u t the w o r d s t r u c t u r e (W,H,K), set the r e g i s t e r R to (i,(W,H,K),a) and r e p e a t the step 2 until i = 0.

STEp 2: Let R be (i,(W,H,K),e). S e a r c h any i t e m (i',(W',H',K'),a') in BL(i) such as C(H',K',H) and (W,H) o~=a. There e x i s t at least one e l e m e n t w h i c h s a t i s f i e s a b o v e c o n d i t i o n s . "Output the w o r d s t r u c t u r e (W',H',K') and R ÷ ( i ' , ( W ' , H ' , K ' ) , a ' ) .

It is e a s y to k n o w t h e o r e m 2 holds. T h e o r e m 2.

A B - p h r a s e s t r u c t u r e of s p e c i f i e d d e p e n d e n c y i n f o r m a t i o n is o u t p u t by the above a l g o r i t h m , if a n d o n l y if the input s t r i n g has at least one B - p h r a s e s t r u c t u r e of s p e c i f i e d d e p e n d e n c y i n f o r m a t i o n and it takes c o n s t a n t m e m o r y space and O(n) e l e m e n t a r y o p e r a t i o n s to o p e r a t e the above

algorithm.

The set of all the d e p e n d e n c y i n f o r m a t i o n s DI of input s t r i n g b is o b t a i n e d from BL(n), since

DI={a I (i, (W,H,K) ,a)£SL(n) , C(H,K) }. D E P E N D E N C Y P A R S E L I S T DL

Let s be a i n p u t s e n t e n c e of N B- phrases. The set of all the d e p e n - dency i n f o r m a t i o n s DI(i) of the i-th B - p h r a s e is o b t a i n e d by o p e r a t i n g the a l g o r i t h m of c o n s t r u c t i o n of BL on the string of the i-th B - p h r a s e .

T h e d e p e n d e n c y p a r s e list DL of s c o n s i s t s of N-i m i n o r lists DL(2), DL(3) , ''- ,DL(N) .

[ ] F o r m of items in DL(i). (ai,J,aj,~,P) I (ai•J,aj, ,P)

where, N ~ i > j ~ l , a i e DI(i), a j E DI(j), c e ~, P ~ and $ is a s p e c i a l l y intro- d u c e d symbol.

I ~ S e m a n t i c s of (ai,J,aj,c,P)6DL(i). (ai,J,aj,c,P) ~ DL]i) • if and o n l y if there is a d e p e n d e n c y struc- ture DS(i,i) of s, w h e r e

(i,J,ai,a~,c) ~ DS(i,i), jGDS (i,l) < P.

~ S e m a n t i c s of ( a i • j • ~ , S , P ) 6 D L ( i ) . (ai•J,?j ,$,P) e Dn(1), if and o n l y if there is a d e p e n d e n c y s t r u c t u r e DS(i,i) of s, w h e r e

a i = i D i D S ( i i) • r a. :JDiDS(i,i), J j is a joint of DS(i,i) e x c e p t

O - t h or 1st joint, jGDS(i,i) = P .

A L G O R I T H M F O R THE C O N S T R U C T I O N OF DL

Input.

The s e q u e n c e of the sets of all d e p e n d e n c y i n f o r m a t i o n s DI(1) , DI(2) , ''" •DI(N) .

Output.

D e p e n d e n c y list DL(2), DL(3) , ''" ,DL(N).

Method.

STEP 1 ( C o n s t r u c t i o n of DL(2))~ For e a c h a e DI(2)• a 1 6 DI(1) and c E ~ such that ~ e6(c~2,c~i) , add

(a2,l,al,c,{c}) to DL(2)• set i to 2 and r e p e a t the STEP 2 and the STEP 3 u n t i l i = N.

STEP 2 ( R e g i s t r a t i o n of items of the form (ai+l,j,aA,c,P)) : For any

(ai,J,aj•c,P) ~ DL(i) and ~ i + 1 6 DI(i+l) , c o m p u t e 6 (ai+ I,~i) and add e v e r y

( a i + l , i , a i , c ' , { c ' } ) to DL(i+i) such that c ' 6 6(ei+1,~i). And, for any

(c~i,J,aj,A,P) 6 DL(i) w h e r e A ~ ~ ~'{$} and a i + 1 £ DI(i+l), c o m p u t e ~(c~i+1,aA) and add e v e r y (ai+l,j,c~j,c',PU {c'}~ to DL(i+i) such that c ' 6 ~ ( a i + 1 , a j) and c ' } P. Go to Step 3.

(5)

STEP 3 (Registration of items of

the form (ai+1,j,ej,$,P)): For any

(ai+1,j,al,c,P) ~ DL(i+i) and (al,k,ek, A,P') # DL~j), add (ei+1,k,ak,$,~') to

DL(i+i). Then, set i to i+; and go

to STEP 2. T h e o r e m 3.

If there exist no a m b i g u i t y in the d e p e n d e n c y i n f o r m a t i o n of B - p h r a s e s of input sentence, then the step 3 in the above a l g o r i t h m can be r e p l a c e d to the f o l l o w i n g step 3'.

STEP 3': For e a c h (~Ki+!,j,~A,A,P)

6 DL(Ki+~), add (~i+~,j,aj,~,P) £o DL(i+i), where

de----~ m a x { k I (ai+l k ~k,C,P)

Ki+l , ,

D L (

i+l )}.

Then, set i to i+l and go to STEP 2.

The e f f i c i e n c y of each step of above a l g o r i t h m is as follows.

The m e m o r y size of DL(i) is O(N). The step i, the step 2 and the step 3 take constant, O(N) and O(N ~)

e l e m e n t a r y operations, respectively. The step 3' takes O(N) e l e m e n t a r y o p e r a t i o n s since it takes O(N)

e l e m e n t a r y o p e r a t i o n s to c o m p u t e Ki+ ~ .

Therefore, the t h e o r e m 4 holds. T h e o r e m 4.

The a l g o r i t h m for the c o n s t r u c t i o n of DL requires O(N ~) m e m o r y space and

O(N ~) e l e m e n t a r y operations. Moreover,

if there exist no a m b i g u i t y in the d e p e n d e n c y i n f o r m a t i o n of each B- phrases, the a l g o r i t h m requires O(N ~) e l e m e n t a r y o p e r a t i o n s b y r e p l a c i n g the step 3 w i t h the step 3'

We shall now d e s c r i b e how to find a d e p e n d e n c y s t r u c t u r e of input s e n t e n c e

from DL. To b e g i n with, we shall

e x p l a i n items of p a r t i a l d e p e n d e n c y structure list PDSL.

Form of items in PDSL (i,j,a~,a~,P#)

where, N h i ~ j ~ i a# ~ DI(i) ~ {#},

~ % DI(j) U { ~ , P~ i~ a subset of C or # O a n d # is s p e c i a l l y i n t r o d u c e d symbol.

~ S e m a n t i c s of (i,j,~#,e#.p#)

. ~ i j-

The item (i,j,a~,e~,P#) % PDSL means to be a dependenceS- s t r u c t u r e D S ( i , j ) ~ ~(i,j) such that f o l l o w i n g c o n d i t i o n s i),2) and 3) hold.

i) If a~=~i(%#), then iDiDS (i,j) =e i •

2) If e#=aj(~#!,~ then JDiDS(i,j)=aj.

3) If P~=P(~#). then JGDS(i,j)=P.

Therefore, (N,i,#,#,#) means to be a d e p e n d e n c y structure of the input sentence.

A L G O R I T H M F O R O B T A I N I N G A D E P E N D E N C Y S T R U C T U R E F R O M DL

Input.

DL.

Output.

A d e p e n d e n c y structure of input s e n t e n c e or the signal "error".

Method.

STEP i: If DL(N) is empty, emit the m e s s a g e "error", else,

i n i t i a l i z e PDSL to {(N,i,#,#,#)} and r e p e a t step 2 until PDSL b e c o m e s empty.

STEP 2: Take an item freely out of

PDSL and d e l e t e it from PDSL. A c c o r d -

ing to the form of the item, e x e c u t e i) or 2) or 3).

i) If the item is (N,i,#,#,#) of

the form and (aN,J,ej,c,P) ~ DL(N) , then output (N,J,eN,ej,c), add (N-i,j, #,aj,P/{c}) to PDSL i~ N-i ~ j and add

(j,l,aj,#,#) to PDSL if j ~ i.

2) If the item is (i,l,ei,#,#) of

the form and ( j , ~ i , e j , c , P ) E DL(i), then o u t p u t (i,J,ei,e~,c), add (i-l,j, #,aj,P/{c}) to PDSL i~ i-i @ j and add

(j,l,a~,#,#) to PDSL if j @ 1

3) aIf the item is ( i , j , ~ , e ~ , P ) of the form, w h e r e ~#=~= or #, anda(ai,j, ~ , c , P ) E DL(i), t h e n ± o u t p u t (i,J,~i, e~,c) and add (i-l,j,#,~j,P/{c}) to

PDSL if i-i % j. When there is not

s u c h item in DL(i), s e a r c h a p a i r of items (ei,k,ak,C,P') E DL(i) and (ak,J, ej,A,P) ~ DL(k), then o u t p u t (i,k,ei,ak, cy , add (i-l,k,#,~k,P'/{c}) to PDSL if i-i @ k and add (k,j,~k,ej,P) to PDSL.

PDSL needs O(N) m e m o r y space and STEP i, STEP 2 take constant, O(N) e l e m e n t a r y operations, respectively.

T h e o r e m 5.

A-igorithm for o b t a i n i n g a d e p e n d e n c y structure from DL r e q u i r e s O(N) m e m o r y space and O(N 2) e l e m e n t a r y operations.

P A R S I N G A L G O R I T H M

Input.

A J a p a n e s e sentence in collo- quial style.

Output.

A d e p e n d e n c y structure DS(N, i) of the input s e n t e n c e and a B - p h r a s e structure of the j-th B-phrase, w h o s e d e p e n d e n c y i n f o r m a t i o n is JDiDS(N,i), for e v e r y j(j=l,2, "'" ,N).

Method.

STEP i: C o n s t r u c t N B - p h r a s e parse lists of all B - p h r a s e s of the input sentence and get the sets of d e p e n d e n c y i n f o r m a t i o n s DI(1), DI(2),

• ." , D I ( N ) .

STEP 2: C o n s t r u c t d e p e n d e n c y parse

list DPL from DI(1), DI(2), ... ,DI(N).

STEP 3: O b t a i n a d e p e n d e n c y

structure DS(N,i) of the input s e n t e n c e from DL.

STEP 4: O b t a i n a B - p h r a s e structure

(6)

Let n~ be the length of j-th B- phrase (~=i,2, •-- ,N), and N , n d e n o t e the n u m b e r of B - p h r a s e s and the length of input sentence, respectively. Then,

n1+n2+ ... +n N = n N L n

By t h e o r e m i, t h e o r e m 2, t h e o r e m 4 and t h e o r e m 5, next t h e o r e m holds.

T h e o r e m 6.

The p a r s i n g a l g o r i t h m r e q u i r e s

O(n 2) m e m o r y space and O(n 3) e l e m e n t a r y operations. Moreover, if the d e p e n d e n - cy i n f o r m a t i o n of each B - p h r a s e is unambiguous, it r e q u i r e s O(n 2) elemen- tary operations.

3. C O N C L U S I O N

Syntax of J a p a n e s e s e n t e n c e s is stated and a e f f i c i e n t p a r s i n g

a l g o r i t h m is given. A J a p a n e s e sen- tence in c o l l o q u i a l style is p a r s e d b Y the p a r s i n g algorithm, u s i n g time O(n ~) a n d m e m o r y space O(n2), w h e r e n is the length of input sentence. Moreover, it is p a r s e d u s i n g time O(n 2) w h e n e v e r d e p e n d e n c y i n f o r m a t i o n of every B- p h r a s e is unambiguous.

R E F E R E N C E S

i. Aho, U l l m a n : "The T h e o r y of Parsing, T r a n s l a t i o n , and Compil- ing", P r e n t i c e Hall vol. 1 (1975). 2. Woods : " T r a n s i t i o n N e t w o r k

G r a m m a r s for N a t u r a l L a n g u a g e A n a l y s i s " , C o m m u n i c a t i o n of the ACM, 13 (1970).

3. Pratt : "LINGOL - - A P r o g r e s s Report", Proc. IJCAI 4 (1975).

(~)

(~s)

(s)

(2)

0-)

J0 =5) Jl (=4) J2 (=3) J3 (=i)

: main part a: agent

- - : annex part p: patient

J0,Jl,J~,J3 : the sequence of joint

Figure 2. D e p e n d e n c y S t r u c t u r e

Example: Taro read the c o m p o s i t i o n w r i t t e n by Hanako.

Figure i. Ways of S p a c i n g

[image:6.596.325.550.123.251.2]

Figure

Figure 2. Dependency Structure

References

Related documents

However, noradrenaline ( I O ^ - I O &#34; 6 g/ml) had no effect on the membrane potential, (b) Changes of the membrane resistance which appeared on treatment with

De meeste gebruikte methode om een waardering te geven is de discounted cashflow methode (DCF methode). Omdat er bij deze rekenmethode subjectieve inputvariabelen zijn, die de

SOCIAL MEDIA TRENDS Vs USER SATISFACTION: A CASE STUDY ON THE IMPACT OF SOCIAL MEDIA ON UNIVERSITY STUDENTS IN JAZAN.. Ms.

Note that all elements in Ᏽ are finitely generated projective, and so, by the proof of Theorem 2.4 ( − ) ∗ ⊗ R E = Hom R ( − ,E), so in other words, if R has a Matlis

In today’s economic conditions hotels want to keep its Star performers are they are stressing upon multiskilling of a staff.. People might feel stress in this

Kubo, “Buzano’s inequality and bounds for roots of algebraic equations,” Proceedings of the American Mathematical Society, vol. Yeh, “The numerical radius and bounds for zeros of

The goal of this study was to evaluate a spectrum of melanocytic lesions, includ- ing common acquired nevi, Spitz nevi, atypical (dys- plastic) nevi, primary invasive melanomas,

Then, after it was confirmed that, in our cell model, PMMA showed a time-dependent manner to trigger pro- duction of all the pro-inflammatory cytokines studied, we