• No results found

Language Based Environment for Natural Language Parsing

N/A
N/A
Protected

Academic year: 2020

Share "Language Based Environment for Natural Language Parsing"

Copied!
9
0
0

Loading.... (view fulltext now)

Full text

(1)

L A N G U A G E - B A S E D E N V I R O N M E N T FOR N A T U R A L L A N G U A G E P A R S I N G

Lehtola, A., J ~ p p i n e n , H., N e l i m a r k k a , E. s i r r a F o u n d a t i o n (*) and

H e l s i n k i U n i v e r s i t y of T e c h n o l o g y H e l s i n k i , F i n l a n d

A B S T R A C T

This paper i n t r o d u c e s a s p e c i a l p r o g r a m m i n g e n v i r o n m e n t for the d e f i n i t i o n of g r a m m a r s and for the i m p l e m e n t a t i o n of c o r r e s p o n d i n g parsers. In n a t u r a l l a n g u a g e p r o c e s s i n g s y s t e m s it is a d v a n t a g e o u s to have l i n g u i s t i c k n o w l e d g e and p r o c e s s i n g m e c h a n i s m s separated. Our e n v i r o n m e n t a c c e p t s g r a m m a r s c o n s i s t i n g of b i n a r y d e p e n d e n c y r e l a t i o n s and g r a m m a t i c a l functions. W e l l - f o r m e d e x p r e s s i o n s of f u n c t i o n s and r e l a t i o n s p r o v i d e c o n s t i t u e n t s u r r o u n d i n g s for s y n t a c t i c c a t e g o r i e s in the form of t w o - w a y automata. These r e l a t i o n s , functions, and a u t o m a t a are d e s c r i b e d in a s p e c i a l d e f i n i t i o n language.

In focusing on high level d e s c r i p t i o n s a l i n g u i s t may ignore c o m p u t a t i o n a l d e t a i l s of the p a r s i n g process. He w r i t e s the g r a m m a r into a D P L - d e s c r i p t i o n and a c o m p i l e r t r a n s l a t e s it into e f f i c i e n t L I S P - c o d e . The e n v i r o n m e n t has also a t r a c i n g f a c i l i t y for the p a r s i n g process, g r a m m a r - s e n s i t i v e l e x i c a l m a i n t e n a n c e p r o g r a m s , and r o u t i n e s for the i n t e r a c t i v e g r a p h i c d i s p l a y of p a r s e trees and g r a m m a r d e f i n i t i o n s . T r a n s l a t o r r o u t i n e s are also a v a i l a b l e for the t r a n s p o r t of c o m p i l e d code b e t w e e n v a r i o u s L I S P - d i a l e c t s . The e n v i r o n m e n t itself e x i s t s c u r r e n t l y in I N T E R L I S P and F R A N Z L I S P . This p a p e r focuses on k n o w l e d g e e n g i n e e r i n g issues and d o e s not enter l i n g u i s t i c a r g u m e n t a t i o n .

I N T R O D U C T I O N

Our o b j e c t i v e has b e e n to build a p a r s e r for F i n n i s h to work as a p r a c t i c a l tool in real p r o d u c t i o n a p p l i c a t i o n s . In the b e g i n n i n g of our work we were faced with two major problems. First, so far there was no formal d e s c r i p t i o n of the F i n n i s h grammar. S e c o n d d i f f i c u l t y was that F i n n i s h d i f f e r s by its s t r u c t u r e g r e a t l y from the I n d o e u r o p e a n languages. F i n n i s h has r e l a t i v e l y free word order and s y n t a c t i c o - s e m a n t i c k n o w l e d g e in a s e n t e n c e is o f t e n e x p r e s s e d in the

i n f l e c t i o n s of the words. T h e r e f o r e e x i s t i n g p a r s i n g m e t h o d s for I n d o e u r o p e a n l a n g u a g e s (eg. ATN, DCG, LFG etc.) did not seem to g r a s p the i d i o s y n c r a c i e s of F i n n i s h .

The p a r s e r s y s t e m we h a v e d e v e l o p e d is b a s e d on f u n c t i o n a l d e p e n d e n c y . G r a m m a r is s p e c i f i e d by a f a m i l y of t w o - w a y f i n i t e a u t o m a t a and by d e p e n d e n c y f u n c t i o n and r e l a t i o n d e f i n i t i o n s . Each a u t o m a t o n e x p r e s s e s the valid d e p e n d e n c y c o n t e x t of o n e c o n s t i t u e n t type. In a b s t r a c t s e n s e the w o r k i n g s t o r a g e of the p a r s e r c o n s i s t s of two c o n s t i t u e n t s t a c k s and of a r e g i s t e r w h i c h h o l d s the c u r r e n t c o n s t i t u e n t (Figure I).

The register of

the current

constituent

LI

L2

L3

RI

R2

R3

The left

The righ

constituent

constituent

stack

stack

F i g u r e I. The w o r k i n g s t o r a g e of D P L - p a r s e r s

(*) S I T R A F o u n d a t i o n

[image:1.612.318.558.280.715.2]
(2)

<-Phrase Adverbial ) < + P h r a s e Adverbial IILD PHRASE ON RIGHT

~*Phrase Subject~ ~ophrase

Phrase ] I

L Adverbial

! * P h r a s e I A d v e r b i a l

IILO PHRASE ON RIGHT

~Phrase

P h r a s e

Sublet1

ILO PHRASE ON RIGHT

• - - N o m i n a e m p t y l e f t - hand side

BUILD PXRA: ON RIGHT

= , N o m i n a l

- +Nominal

~nd of inpul

@

FIND REGENT ON RIGHT

Notations:

On t h e l e f t is I On the l e f t is a s t a t e transition

the s t a t e node ? X w i t h priority, conditions f o r

of the a u t o m a t o n {cond$ . . . . the d e p e n d e n t c a n d i d a t e (if not Toncllon) o t h e r w i s e d s t a t e d ) and

k

The question m a r k I

indicates the direction 4, connection function indicated.

Double circles a r e used to d e n o t e e n t r e e s and e x i t s of an a u t o m a t o n • Inside is e x p r e s s e d the m a n n e r of o p e r a t i o n .

F i g u r e 2. A t w o - w a y a u t o m a t o n for F i n n i s h v e r b s

The two stacks hold the right and left c o n t e x t s of the c u r r e n t c o n s t i t u e n t . The p a r s i n g p r o c e s s is a l w a y s d i r e c t e d by the e x p e c t a t i o n s of the c u r r e n t c o n s t i t u e n t . D y n a m i c local c o n t r o l is r e a l i z e d by p e r m i t t i n g the a u t o m a t a to a c t i v a t e one another. The b a s i c d e c i s i o n for the a u t o m a t o n a s s o c i a t e d w i t h the c u r r e n t c o n s t i t u e n t is to a c c e p t or r e j e c t a n e i g h b o r via a valid s y n t a c t i c o - s e m a n t i c s u b o r d i n a t e relation. A c c e p t a n c e s u b o r d i n a t e s the n e i g h b o r , and it d i s a p p e a r s from the stack. The s t r u c t u r e an input s e n t e n c e r e c e i v e s is an a n n o t a t e d tree of such b i n a r y relations.

An a u t o m a t o n for v e r b s is d e s c r i b e d in F i g u r e 2. W h e n a v e r b b e c o m e s the c u r r e n t c o n s t i t u e n t for the first time it w i l l enter the a u t o m a t o n t h r o u g h the S T A R T node. The a u t o m a t o n e x p e c t s to find a d e p e n d e n t from the left (?V). If the left

S u b j e c t and then for Object. W h e n a f u n c t i o n test s u c c e e d s , the n e i g h b o r w i l l

be s u b o r d i n a t e d and the v e r b a d v a n c e s to

the s t a t e i n d i c a t e d by arcs. The d o u b l e c i r c l e s t a t e s d e n o t e e n t r y and exit p o i n t s of the a u t o m a t o n .

~f c o m p l e t e d c o n s t i t u e n t s do not e x i s t as n e i g h b o r s , an a u t o m a t o n m a y d e f e r d e c i s i o n . In the F i g u r e 2 s t a t e s l a b e l l e d " B U I L D P H R A S E ON RIGHT" and " F I N D R E G E N T ON R I G H T " p u s h the v e r b to the left stack and p o p the r i g h t stack for the c u r r e n t c o n s t i t u e n t . W h e n the v e r b is a c t i v a t e d later on, the c o n t r o l flow w i l l c o n t i n u e from the s t a t e e x p r e s s e d in the d e a c t i v a t i o n c o m m a n d .

[image:2.612.78.544.79.469.2]
(3)

The functions, relations and automata are

expressed in a special conditional

expression formalism DPL (for D e p e n d e n c y

Parser Language). We believe that DPL

might find applications in other

inflectional languages as well.

D P L - D E S C R I P T I O N S

The main object in DPL is a constituent.

A grammar specification opens with the

structural descriptions of constituents

and the allowed property names and

property values. User may specify simple

properties, features or categories. The

structures of the lexical entries are also

defined at the beginning. The syntax of

these declarations can be seen in Figure 3.

All properties of constituents may be

referred in a uniform manner using their values straight. The system automatically

takes into account the computational

details associated to property types. For example, the system is automatically tuned to notice the inheritance of properties in

their hierarchies. Extensive support to

m u l t i d i m e n s i o n a l analysis has been one of

the central objectives in the design of

the DPL-formalism. Patterning can be done

in multiple dimensions and the property

set associated to constituents can easily be extended.

An example of a constituent structure and

its property definitions is given in

Figure 4. The description states first

that each constituent contains Function,

Role, ConstFeat, PropOfLexeme and

MorphChar. The next two following

d e f i n i t i o n s further specify C o n s t F e a t and

PropOfLexeme. In the last part the

d e f i n i t i o n of a category tree SemCat is given. This tree has sets of p r o p e r t y

values associated with nodes. The

D P L - s y s t e m automatically takes care of

their inheritances. Thus for a

c o n s t i t u e n t that belongs to the semantic c a t e g o r y Human the system a u t o m a t i c a l l y

associates feature values +Hum, +Anim,

+Countable, and +Concr.

The binary grammatical functions and

relations are defined using the syntax in

Figure 5. A DPL-function returns as its

value the binary construct built from the ~ u r r e n t constituent (C) and its d e p e n d e n t c a n d i d a t e (D), or it returns NIL.

DPL-relations return as their values the

pairs of C and D c o n s t i t u e n t s that have passed the associated predicate filter. By choosing operators a user may vary a p r e d i c a t i o n between simple equality (=)

and equality with ambiguity elimination

(=:=). Operators := and :- denote

replacement and insertion, respectively.

In predicate expressions angle brackets

signal the scope of an implicit

OR-operator and parentheses that of an

< c o n s t i t u e n t s t r u c t u r e > : : = ( CONSTITUENT: < s u b t r e e o~ c o n s t i t u e n t > : : = ( SUBTREE:

< l i s t o f p r o p e r t i e s >

< p r o p e r t y name> < t y p e name> < g l u e node name> < g l u e node>

< l i s t o f p r o p e r t i e s > . . ) < g l u e node>

< l i s t o f p r o p e r t i e s > ) : ( LEXICON-ENTRY: < g l u e node>

< l i s t o f p r o p e r t i e s > ) : : = ( < l i s t o f p r o p e r t i e s > . . )

( < p r o p e r t y name>.. )

: : = < t y p e name> : < g l u e node name> : : = < u n i q u e l i s p atom>

: : = < u n i q u e l i s p atom>

: : = < g l u e node name i n u p p e r l e v e l - >

< p r o p e r t y d e c l a r a t i o n >

< p o s s i b l e v a l u e s > < d e f a u l t v a l u e > <node d e f i n i t i o n > <node name> < f e a t u r e s e t > < f a t h e r node> <empty>

: : = ( PROPERTY: < t y p e name> < p o s s i b l e v a l u e s > ) :

( FEATURE: < t y p e name> < p o s s i b l e v a l u e s > )

( CATEGORY: < t y p e name> < <node d e f i n i t i o n > . . > ) : : = < < d e f a u l t v a l u e > < u n i q u e l i s p a t o m > . . >

: : = N o D e f a u l t : < u n i q u e l i s p atom>

: : = ( <node name> < f e a t u r e s e t > < f a t h e r node> ) : : = < u n i q u e l i s p atom>

: : = ( < f e a t u r e v a l u e > ) : <empty>

: : = / <name o f an a l r e a d y d e f i n e d node> : <empty> : : =

[image:3.612.68.543.91.712.2] [image:3.612.72.538.402.703.2]
(4)

(CONSTITUENT:

(LEXICON-ENTRY:

(SUBTREE:

(CATEGORY:

( F u n c t i o n R o l e C o n s t F e a t P r o p O g L e x e m e M o r p h c h a r ) )

P r o p O f L e x e m e

( ( S y n t C a t S y n t F e a t ) (SemCat SemFeat) ( F r a m e C a t L e x F r a m e ) AKO ) )

MorphChar

( P o l a r V o i c e Modal T e n s e C o m p a r i s o n Number Case P e r s o n N P e r s o n P C l i t l C l i t 2 ) )

SemCat < ( E n t i t y )

( C o n c r e t e ( + C o n c r ) / E n t i t y )

( A n i m a t e ( +Anim + C o u n t a b l e ) / C o n c r e t e ) ( Human ( +Hum ) / A n i m a t e )

( A n i m a l s / A n i m a t e ) ( NonAnim / C o n c r e t e )

( M a t t e r ( - C o u n t a b l e ) / NonAnim ) ( T h i n g ( + C o u n t a b l e ) / NonAnim ) >

F i g u r e 4. A n e x a m p l e of a c o n s t i t u e n t s t r u c t u r e s p e c i f i c a t i o n a n d the d e f i n i t i o n of an c a t e g o r y t r e e

i m p l i c i t A N D - o p e r a t o r . A n a r r o w t r i g g e r s d e f a u l t s on: t h e e l e m e n t s of e x p r e s s i o n s to the r i g h t of an a r r o w a r e in the O R - r e l a t i o n a n d t h o s e to the l e f t of it a r e in t h e A N D - r e l a t i o n . T w o k i n d s of a r r o w s a r e in use. A s i m p l e a r r o w (->) p e r f o r m s all o p e r a t i o n s on t h e r i g h t and a d o u b l e a r r o w (=>) t e r m i n a t e s t h e e x e c u t i o n at the f i r s t s u c c e s s f u l o p e r a t i o n .

In F i g u r e 6 is an e x a m p l e of h o w o n e m a y d e f i n e S u b j e c t . If the r e l a t i o n R e c S u b j h o l d s b e t w e e n the r e g e n t and the d e p e n d e n t c a n d i d a t e the l a t t e r w i l l be l a b e l l e d

S u b j e c t and s u b o r d i n a t e d to the f o r m e r . T h e r e l a t i o n a l e x p r e s s i o n R e c S u b j d e f i n e s t h e p r o p e r t y p a t t e r n s t h e c o n s t i t u e n t s s h o u l d m a t c h .

A g r a m m a r d e f i n i t i o n e n d s w i t h the c o n t e x t s p e c i f i c a t i o n s of c o n s t i t u e n t s e x p r e s s e d a s t w o - w a y a u t o m a t a . T h e a u t o m a t a a r e d e s c r i b e d u s i n g t h e n o t a t i o n s h o w n in s o m e w h a t s i m p l i f i e d f o r m in F i g u r e 7. A n a u t o m a t o n c a n r e f e r up to t h r e e c o n s t i t u e n t s to the r i g h t or l e f t u s i n g i n d e x e d n a m e s : LI, L2, L3, RI, R2 or R3.

< ~ u n c t i o n > : : = ( FUNCTION: < ~ u n c t i o n name> < o p e r a t i o n e x p r > ) < r e l a t i o n > : : = ( RELATION: < r e l a t i o n name> < o p e r a t i o n e x p r > ) < o p e r a t i o n e x p r > : : = ( < p r e d i c a t e e ~ p r > . . < i m p l y < o p e r a t i o n e × p r > . . )

< p r e d i c a t e e x p r > < r e l a t i o n name> :

( DEL < c o n s t i t u e n t l a b e l > ) < p r e d i c a t e e x p r > : : = < < p r e d i c a t e e x p r > > I

( < p r e d i c a t e e x p r > )

( < c o n s t i t u e n t p o i n t e r > < o p e r a t o r > < v a l u e e x p r > ) < i m p l > : : = - > I =>

< c o n s t i t u e n t l a b e l > : : = C I D

< o p e r a t o r > ::= = I := I :-- I = : = < v a l u e e x p r > : : = < < v a l u e e x p r > . . > :

( < v a l u e e x p r > . . ) : < v a l u e o~ s o m e p r o p e r t y > I

' < l e x e m e > I

( < p r o p e r t y n a m e > < c o n s t i t u e n t l a b e l > )

[image:4.612.60.548.62.710.2] [image:4.612.58.539.76.381.2]
(5)

(FUNCTION:

)

(RELATION:

S u b j e c t

( R e c S u b j - > (D : = S u b j e c t ) )

R e c S u b j

( ( C = A c t < I n d Cond P o t I m p e r >) (D = - S e n t e n c e + N o m i n a l ) - > ( ( D = Nom)

- > (D = P e r s P r o n ( P e r s o n P C) ( P e r s o n N C ) )

( ( D = Noun) (C = 3P) - > ( ( C = S) (D = SG))

( ( C = P ) ( D = P L ) ) ) ) ( ( D = P a r t ) ( C = S 3 P )

- > ( ( C = " O L L A )

=> (C : - + E x i s t e n c e ) )

( ( C = - T r a n s i t i v e + E x i s t e n c e ) ) ) )

Figure 6. A realisation of Subject

< s t a t e i n a u t o m . > : : = ( STATE: < s t a t e name> < d i r e c t i o n > < s t a t e e x p r > . . )

< d i r e c t i o n > : : = LEFT | RIGHT

< s t a t e e x p r > : : = ( < l h s o f s . e x p r > < i m p l > < s t a t e e x p r > . . ) ( < l h s o f s . e x p r > < i m p l > < s t a t e c h a n g e > ) < l h s o f s . e x p r > : : = < f u n c t i o n name> ~ < p r e d i c a t e e x p r > . .

< s t a t e c h a n g e > : : = ( C : = <name o f n e x t s t a t e > ) :

( FIND-REG-ON < d i r e c t i o n > < s s t a t e o h . > )

( BUILD-PHRASE-ON < d i r e c t i o n > < s s t a t e o h . > )

( P A R S E D )

< s t a t e c h a n g e > : : = < w o r k s p . m a n i p ° > < s t a t e c h a n g e > < s s t a t e c h . > : : = ( C : = <name o f r e t u r n s t a t e > ) < w o r k s p . m a n i p ° > : : = ( DEL < c o n s t i t u e n t l a b e l > )

( TRANSPOSE < c o n s t i t u e n t l a b e l > < c o n s t i t u e n t l a b e l > )

Figure 7. Simplified syntax of state specifications

( STATE: V? RIGHT

( ( D = + P h r a s e ) - > ( S u b j e c t - > (C : = V S ? ) )

( O b j e c t - > (C : = VO?))

( A d v e r b i a l - > (C : = V ? ) )

(T => (C : = ? V F i n a l ) ) )

( ( D = - P h r a s e ) - > (BUILD-PHRASE-ON RIGHT (C : = V ? ) ) )

[image:5.612.64.535.75.662.2]
(6)

The d i r e c t i o n of a state (see F i g u r e 2.) s e l e c t s the d e p e n d e n t c a n d i d a t e n o r m a l l y as L1 or R1. A s w i t c h of state takes p l a c e by an a s s i g n m e n t in the same way as l i n g u i s t i c p r o p e r t i e s are assigned. As an e x a m p l e the node V? of F i g u r e 2 is d e f i n e d f o r m a l l y in F i g u r e 8.

M o r e l i n g u i s t i c a l l y o r i e n t e d

a r g u m e n t a t i o n of the D P L - f o r m a l i s m a p p e a r s e l s e w h e r e (Nelimarkka, 1984a, and N e l i m a r k k a , 1984b).

THE A R C H I T E C T U R E OF THE D P L - E N V I R O N M E N T

The a r c h i t e c t u r e of the D P L - e n v i r o n m e n t is d e s c r i b e d s c h e m a t i c a l l y in F i g u r e 9. The m a i n parts are h i g h l i g h t e d by h e a v y lines. S i n g l e arrows r e p r e s e n t d a t a transfer; d o u b l e arrows indicate the p r o d u c t i o n of d a t a structures. All m o d u l e s have b e e n i m p l e m e n t e d in LISP. The r e a l i s a t i o n s do not rely on s p e c i f i c s of u n d e r l y i n g L I S P - e n v i r o n m e n t s .

The D P L - c o m p i l e r

A c o m p i l a t i o n results in e x e c u t a b l e code of a parser. The c o m p i l e r p r o d u c e s h i g h l y

o p t i m i z e d c o d e (Lehtola, 1984). I n t e r n a l l y d a t a s t r u c t u r e s are only p a r t l y d y n a m i c for the r e a s o n of fast i n f o r m a t i o n fetch. A m b i g u i t i e s are e x p r e s s e d l o c a l l y to m i n i m i z e redundant search. The p r i n c i p l e of s t r u c t u r e s h a r i n g is f o l l o w e d w h e n e v e r new data s t r u c t u r e s are built. In the m a n i p u l a t i o n of c o n s t i t u e n t s t r u c t u r e s there e x i s t s a s p e c i a l s e r v i c e r o u t i n e for each c o m b i n a t i o n of p r o p e r t y and p r e d i c a t i o n types. T h e s e r o u t i n e s take s p e c i a l care of time and m e m o r y c o n s u m p t i o n . For i n s t a n c e with r e g a r d r e p l a c e m e n t s and i n s e r t i o n s the c o p y i n g i n c l u d e s p h y s i c a l l y only the path from the root of the list s t r u c t u r e to the c h a n g e d sublist. The l o g i c a l l y shared p a r t s w i l l • be s h a r e d also p h y s i c a l l y . This

s t i p u l a t i o n m i n i m i z e s m e m o r y u s a g e .

In the state t r a n s i t i o n n e t w o r k level the s e a r c h is done d e p t h first. To h a n d l e a m b i q u i t i e s D P L - f u n c t i o n s and - r e l a t i o n s p r o c e s s all a l t e r n a t i v e i n t e r p r e t a t i o n s in p a r a l l e l . In fact the a l t e r n a t i v e s are s t o r e d in the stacks and in the C - r e g i s t e r as trees of a l t e r n a n t s .

In the first v e r s i o n of the D P L - c o m p i l e r the g e n e r a t i o n rules w e r e i n t e r m i x e d w i t h the c o m p i l e r code. The m a i n t e n a n c e of the c o m p i l e r g r e w h a r d e r w h e n we e x p e r i m e n t e d w i t h new c o m p u t a t i o n a l features. We

p a r s e r

facility

lexicon maintenance

information extraction system

[image:6.612.71.559.331.723.2]
(7)

t h e r e f o r e s t a r t e d to d e v e l o p a m e t a c o m p i l e r in w h i c h c o m p i l a t i o n is d e f i n e d by rules. At m o m e n t we are t e s t i n g it and soon it w i l l be in e v e r y d a y use. T h e a m o u n t of L I S P - c o d e has g r e a t l y r e d u c e d with the rule based a p p r o a c h , and we are n o w p l a n n i n g to i n s t a l l the D P L - e n v i r o n m e n t into IBM PC.

Our p a r s e r s w e r e a i m e d to be p r a c t i c a l tools in real p r o d u c t i o n a p p l i c a t i o n s . It w a s h e n c e i m p o r t a n t to m a k e the p r o d u c e d p r o g r a m s t r a n s f e r a b l e . As of now we h a v e a r u l e - b a s e d t r a n s l a t o r w h i c h c o n v e r t s p a r s e r s b e t w e e n L I S P d i a l e c t s . The t r a n s l a t o r a c c e p t s c u r r e n t l y I N T E R L I S P , F r a n z L I S P and C o m m o n Lisp.

L e x i c o n and its M a i n t e n a n c e

T h e e n v i r o n m e n t has a s p e c i a l m a i n t e n a n c e p r o g r a m for l e x i c o n s . The p r o g r a m uses v i d e o g r a p h i c s to e a s e u p d a t i n g and it p e r f o r m s v a r i o u s c h e c k s to g u a r a n t e e the c o n s i s t e n c y of the l e x i c a l e n t r i e s . It a l s o c o - o p e r a t e s w i t h the i n f o r m a t i o n e x t r a c t i o n s y s t e m to h e l p the user in the s e l e c t i o n of p r o p e r t i e s .

T h e T r a c i n g F a c i l i t y

T h e t r a c i n g f a c i l i t y is a c o n v e n i e n t tool for g r a m m a r d e b u g g i n g . For e x a m p l e , in F i g u r e I0 a p p e a r s the t r a c e of the p a r s i n g of the s e n t e n c e " P o i k a n i tuli i l l a l l a k e n t ~ i t ~ h e i t t ~ m ~ s t ~ k i e k k o a . " (= " M y son

( T POIKANI TULI ILLALLA KENT~LT~ HEITT~M~ST~ KIEKKOA . )

~ 8 ~ ¢ c ~ s e s • 03 seconds

0 . 0 s e c o n d s , g a r b a g e c o l l e c t i o n t i m e P A R S E D

_ P R T H ( )

= > ( P O I K A ) (TULJ.A) ( I L T A ) ( K E N T T ~ ) ( H E I T T ~ ) (KIE]<KO) ?N ( P O I K A ) < = ( T U L L A ) ( I L T A ) ( K E N T T ~ ) ( H E I T T ~ ) ( K I E K K O ) N? = > ( P O I K A ) ( T U L L A ) ( I L T A ) ( K E N T T ~ ) ( H E I T T ~ ) ( K I E K K O ) ? N F i n a l

( # # ) ( P O I K A ) ( T U L L A ) ( I L T A ) ( K E N T T ~ ) ( H E I T T ~ ) ( K I E K K O ) NIL ( P O I K A ) => ( T U L L A ) (ILTA) ( K E N T T ~ ) ( H E I T T ~ ) ( K I E K K O ) ?V. ,=> ( ( P O I K A ) TULLA) (ILTA) (KENTT~) ( H E I T T ~ ) ( K I E K K O ) ?VS

((POIKA) TULLA) <= ( ~ L T A ) (KENTT~) (HEITT~&) ( K I E K K O ) VS?

((POIKA) TULLA) => (ILTA) (KENTT~) (HEITT~&~) (KIEKKO) ?N

( ( P O I K A ) T U L L A ) ( I L T A ) <= ( K E N T T ~ ) ( H E I T T ~ ) ( K I E K K O ) N? ((POIKA) TULLA) => "(ILTA) (KENTT~) ( H E I T T ~ ) ( K I E K K O ) ? N F i n a l

((POIKA) TULLA) <= (ILTA) (KENTT~) ( H E I T T ~ ) ( K I E K K O ) VS?

((POIKA) TULLA ( I L T A ) ) <= (KENTT~) (HEITTYdl) ( K I E K K O ) VS?

((POIKA) TULLA (ILTA)) => (KENTT&) ( H E I T T ~ ) (KIEKKO) ?N

((POIKA) TULLA ( I L T A ) ) (KENTT~) <= ( H E I T T ~ ) (KIEKKO) N?

((POIKA) TULLA ( I L T A ) ) => (KENTT~) ( H E I T T ~ ) ( K I E K K O ) ? N F i n a l ((POIKA) TULLA (ILTA)) <= (KENTT&) ( H E I T T ~ ) ( K I E K K O ) VS? ( ( P O L K A ) T U L L A ( I L T A ) ( K E N T T ~ ) ) < = ( H E I T T ~ ) ( K I E K K O ) V S ?

((POIKA) TULLA (ILTA) (KENTT~)) => (HEITT~i) (KIEKKO) .9%/

((POIKA) TULLA (ILTA) (KENTT~)) ( H E I T T ~ ) <= (KIEKKO) V?

((POIKA) TULLA (ILTA) (KENTT~)) (HEITT~dl) => (KIEKKO) ?N

((POIKA) TULLA (ILTA) (KENTT~)) ( H E I T T ~ ) (KIEKKO) <= N?

((POIKA) TULLA (ILTA) (KENTT~)) (HEITT&~) => (KIEKKO) ?NFinal

((POIKA) TULLA (ILTA) (KENTT~)) ( H E I T T ~ ) <= (KIEKKO) V?

((POIKA) TULLA (ILTA) (KENTT&)) ( H E I T T ~ (KIEKKO)) <= VO?

((POIKA) TULLA (ILTA) (KENTT~)) => ( H E I T T ~ (KIEKKO)) ? V F i n a l

((POIKA) TULLA (ILTA) (KENTT~)) <= (HEITT&~ (KIEKKO)) VS?

((POIKA) TULLA (ILTA) (KENTT~) ( H E I T T ~ (KIEKKO))) <= VS?

=> ((POIKA) TULLA (ILTA) (KENTT~) ( H E I T T ~ (KIEKKO))) ? V F i n a l

((POIKA) TULLA (ILTA) (KENTT~) ( H E I T T ~ (KIEKKO))) <= MainSent?

((POIKA) TULLA (ILTA) (KENTT~) (HEITT&& (KIEKKO))) <= MainSent? OK

DONE

[image:7.612.72.544.234.725.2]
(8)

c a m e back in the e v e n i n g f r o m the s t a d i u m w h e r e he had b e e n t h r o w i n g the d i s c u s . " ) . Each row r e p r e s e n t s a state of the p a r s e r b e f o r e the c o n t r o l e n t e r s the s t a t e m e n t i o n e d on the r i g h t - h a n d column. T h e t h u s - f a r found c o n s t i t u e n t s are s h o w n by the p a r e n t h e s i s . An a r r o w h e a d p o i n t s from a d e p e n d e n t c a n d i d a t e (one w h i c h is s u b j e c t e d to d e p e n d e n c y tests) t o w a r d s the c u r r e n t c o n s t i t u e n t .

The t r a c i n g f a c i l i t y g i v e s also the c o n s u m e d C P U - t i m e and two q u a l i t y i n d i c a t o r s : s e a r c h e f f i c i e n c y and c o n n e c t i o n e f f i c i e n c y . S e a r c h e f f i c i e n c y is 100%, if no u s e l e s s s t a t e t r a n s i t i o n s took p l a c e in the search. T h i s figure is m e a n i n g l e s s w h e n the s y s t e m is p a r a m e t e r i z e d to full s e a r c h b e c a u s e then all t r a n s i t i o n s are tried.

C o n n e c t i o n e f f i c i e n c y is the ratio of the n u m b e r of c o n n e c t i o n s r e m a i n i n g in a r e s u l t to the total n u m b e r of c o n n e c t i o n s a t t e m p t e d for it d u r i n g the search. W e are c u r r e n t l y d e v e l o p i n g o t h e r m e a s u r i n g tools to e x t r a c t s t a t i s t i c a l i n f o r m a t i o n , eg. a b o u t the f r e q u e n c y d i s t r i b u t i o n of

d i f f e r e n t c o n s t r u c t s . U n d e r d e v e l o p m e n t is also a u t o m a t i c b o o k - k e e p i n g of all s e n t e n c e ~ input to the system. T h e s e w i l l be d i v i d e d into two g r o u p s : p a r s e d and n o t parsed. The first g r o u p c o n s t i t u t e s g r o w i n g test m a t e r i a l to e n s u r e m o n o t o n i c i m p r o v e m e n t of g r a m m a r s : a f t e r a non t r i v i a l c h a n g e is d o n e in the g r a m m a r , a n e w c o m p i l e d p a r s e r runs all test s e n t e n c e s and the r e s u l t s are c o m p a r e d to the p r e v i o u s ones.

I n f o r m a t i o n E x t r a c t i o n S y s t e m

In an a c t u a l w o r k i n g s i t u a t i o n t h e r e m a y be t h o u s a n d s of l i n g u i s t i c s y m b o l s in the w o r k space. To m a k e such a c o m p l e x m a n a g e a b l e , we have i m p l e m e n t e d an i n f o r m a t i o n s y s t e m that for a g i v e n s y m b o l p r e t t y - p r i n t s all i n f o r m a t i o n a s s o c i a t e d w i t h it.

T h e e n v i r o n m e n t has r o u t i n e s for the g r a p h i c d i s p l a y of p a r s i n g results. A user c a n s e l e c t i n f o r m a t i o n by p o i n t i n g w i t h the cursor. The e x a m p l e in F i g u r e Ii d e m o n s t r a t e s the use of this facility. T h e c o m m a n d SHOW() i n q u i r e s the r e s u l t s of

_SHOW ( )

( P O I K A N I ) ( T U L I ) ( I L J . R L L R ) ( K I ~ & I . T & ) ( HE I T T 3 1 I ' I ~ X ) ( K I E K ~ ) STRRT

( ( P I ] I K A ) T U L L A ( I L T A ] ~ K E N T T ~ ) ( H E I T T x x ( K I E K K O ) ) ) !

TULLA I I ! i

S u b J e c t ' o a t i v e N e u t r a l )

, i

! !

ILTA KENTTX

A d v e r b i a l A d v e r b i a l

TiaeIPred A b l a t i v e

F u n c t i o n S u b J e c t

R o l e ( E r g a t i v e N e u t r a l )

F r a m e F e a t ( N I L )

P o l a r ( P o s )

I V o i c e ( N I L )

! M o d a l ( N I L )

T e n s e ( N I L )

Comparison ( N i l C o l p a r )

N u m b e r (SG)

C a s e ( N e e )

P e r s o n N ( S )

P ~ s o n P ( I P )

C l i t l ( N I L )

C l i t 2 ( N I L )

, e

HEITT~U~

Adverbial

S

! K I E K K O O b j e c t N e u t r a l

C o n s t F e a t i s a l i n g u i s t i c f e a t u r e type.

D e f a u l t v a l u e n - P h r a s e

A s s o c i a t e d v a l u e s : ( + D e c l a r a t i v e - D e c l a r a t i v e +Main - M a i n +Nominal - N o m i n a l +Phrase - P h r a s e + P r e d i c a t i v e - P r e d i c a t i v e + R e l a t i v e - R e l a t i v e + S e n t e n c e - S e n t e n c e )

A s s o c i a t e d ~ u n c t i o n s l

[image:8.612.68.558.100.698.2]
(9)

the p a r s i n g p r o c e s s d e s c r i b e d in F i g u r e i0. The s y s t e m r e p l i e s by first p r i n t i n g the s t a r t state and then the found result(s) in c o m p r e s s e d Eorm. The c u r s o r has b e e n m o v e d on top of this p a r s e and C T R L - G has b e e n typed. The s y s t e m now d r a w s the p i c t u r e of the tree s t r u c t u r e . S u b s e q u e n t l y one of the n o d e s has b e e n opened. The p r o p e r t i e s of the node P O I K A a p p e a r p r e t t y - p r i n t e d . The user has f u r t h e r m o r e asked i n f o r m a t i o n a b o u t the p r o p e r t y type C o n s t F e a t . All t h e s e o p e r a t i o n s are g e n e r a l ; they do not use the s p e c i a l f e a t u r e s of any p a r t i c u l a r terminal.

C O N C L U S I O N

The p a r s i n g s t r a t e g y a p p l i e d for the D P L - f o r m a l i s m was o r i g i n a l l y v i e w e d as a c o g n i t i v e model. It has p r o v e d to r e s u l t p r a c t i c a l and e f f i c i e n t p a r s e r s as well. E x p e r i m e n t s w i t h a n o n - t r i v i a l set of F i n n i s h s e n t e n c e s t r u c t u r e s h a v e b e e n p e r f o r m e d both on D E C - 2 0 6 0 and on V A X - I I / 7 8 0 systems. The a n a l y s i s of an e i g h t word sentence, for instance, takes b e t w e e n 20 and 600 ms of DEC C P U - t i m e in the I N T E R L I S P - v e r s i o n d e p e n d i n g on w h e t h e r one w a n t s o n l y the first or, t h r o u g h c o m p l e t e search, all p a r s e s for s t r u c t u r a l l y a m b i g u o u s s e n t e n c e s . The M a c L I S P - v e r s i o n of the p a r s e r r u n s a b o u t 20 % f a s t e r on the same c o m p u t e r . T h e N I L - v e r s i o n (Common L i s p compatible) is a b o u t 5 times slower on VAX. T h e w h o l e e n v i r o n m e n t has b e e n t r a n s f e r r e d a l s o to F r a n z L I S P on VAX. W e have not yet focused on o p t i m a l i t y issues in g r a m m a r d e s c r i p t i o n s . We b e l i e v e that by r e a r r a n g i n g the o r d e r i n g s of e x p e c t a t i o n s in the a u t o m a t a i m p r o v e m e n t in e f f i c i e n c y ensues.

R E F E R E N C E S

i. Lehtola, A., C o m p i l a t i o n and I m p l e m e n t a t i o n of 2 - w a y T r e e A u t o m a t a for the P a r s i n g of Finnish. M.So Thesis, ~ e l s i n k i U n i v e r s i t y of T e c h n o l o g y , D e p a r t m e n t of P h y s i c s , 1984, 120 p. (in Finnish)

2° N e l i m a r k k a , E°, J ~ p p i n e n , H. and L e h t o l a A., T w o - w a y F i n i t e A u t o m a t a and D e p e n d e n c y Theory: A P a r s i n g M e t h o d for I n f l e c t i o n a l Free W o r d O r d e r L a n g u a g e s . Proc. C O L I N G 8 4 / A C L , S t a n f o r d , 1984a, pp. 389-392.

3° N e l i m a r k k a , E., J ~ p p i n e n , H. and L e h t o l a A., P a r s i n g an I n f l e c t i o n a l F r e e W o r d O r d e r L a n g u a g e w i t h T w o - w a y F i n i t e A u t o m a t a ° Proc. of the 6th E u r o p e a n C o n f e r e n c e on A r t i f i c i a l I n t e l l i g e n c e , Pisa, 1984b, pp. 167-176.

4. W i n o g r a d , To, L a n g u a g e as a C o g n i t i v e

P r o c e s s . V o l u m e I: Syntax,

Figure

Figure I. of DPL-parsers
Figure 2. A two-way automaton for Finnish verbs
Figure 3. The syntax of constituent structure and property definitions
Figure 4. An example of a constituent structure specification and the definition of an category tree
+5

References

Related documents

Besides, there was a highly significant (P ≤ 0.001) negative correlation between root rot severity and percentage of plants that produced lateral roots per genotype suggesting

Keywords: usage of mobile, language learning, online tools and better communication

The disabilities for which manpower development programme are conducted in different categories ranging from certificate to Master Degree in Regular, Distance and

EURASIP Journal on Applied Signal Processing 2004 3, 393?411 c? 2004 Hindawi Publishing Corporation Soft and Joint Source Channel Decoding of Quasi Arithmetic Codes Thomas Guionnet

Global Computing refers to computation via the sharing of a seamless, distributed, open-ended network of bounded resource by agents of all sort (possibly malicious), acting with

We discuss the uniqueness problem of meromorphic functions sharing one value and obtain two theorems which improve a result of Xu and Qu and supplement some other results earlier

For each, the highest rate of alcohol-related fatalities is associated with systems featuring state ownership of wholesale only, followed by state ownership of wholesale and

This study will fill this gap, by collecting detailed information via postal questionnaires and medical record review (where con- sent is given), from patients receiving a new