American Journal
of
Computational
Linguistics
Microfiche
37
28MAKOTO NAGAO
ANDJ U N - I C H I
T S U J I I
Department
of Electrical
Engineering
Kyoto University
Kyoto,
Japan
ABSTRACT
PLATON
(Programming
LAnguage
for
Tree
OperatioN)facilities
of
patternmatching
and
flexible backtracking,
language
is developed
t~simplify
writing
analysisprograms
The
patternmatching
processhas
the
facility
t o e x t r a c tsub-
input
sentence
and
invoke semantic
and
contextualchecking fo?.
actions
between
syntactic
and
other
componentsare
easily
obt
processing r e s u l t s
i n a
failure,
amessage
which
expresses
t :failure
will be
sentup.
The
control
w i l l
be
modified
accoru
enables
us t o
write
f a i r l y
complicated non-deterministic progrmanner.
An example ofstructural
analysisusing
PLATON
is a l s
-
,le1.
The
1
language
Erom t h c Intc
-
-.
rf
Iseof
.Th
i
:-; i n : ; e d "
I
I n t r o d u c t i o n
In
t h i s paperwe describe
anew
programming languagewhich
is designedto
facilitate
the writing ofnatural
language grammars.
A s i m p l e structuralanalysis program using this language
is given as
an
example.
Thereare
twokey
i s s u e sin
analyzing
natural language by computer:1) how
to representknowledge (semantics, pragmatics) and t h e s t a t e (context) of t h e world, and
2) how to advance the programming technology appropriate for syntactic-
semantic, syntactic-contextual interface. The point in d e s i g n i n g a programming
T r a d i t i m a 1 systems whi.ch r e p r e s e n t grammars a s a s e t of r e w r i t i n g
r u l e s u s t i a l l y 'have poor c o n t r o l mechanisms, and flexible i n t e r a c t i o n between
t h c s y n t a c t LC and other con~ponents is n o t possible. Systems in which r u l e s
o f g r - ; l n m l i l r s . r l r c cn11,i-dilcd i n p r o c c d r l r c s , a n . tllc other hand, make it p o s s i b l c
t o i n t t - r m i s ttrct 6 yrltnc t ' l c and seniant i c a n a I y s t ? s i n an i r ~ t i i n n t e way. Llowcvcr,
thest. s y s t c o s are apt. t . 0 c.lestroy the i n t c . l l i g i h : i l ity and r e g u l a r i t y o f
n a t u r a l language g r a m a r s , because i n these systems b o t h r u l e s and their
c o n t r o l mechanisms are contained i n the same program.
Recently various syscems for n a t u r a l language a n a l y s i s h a v e been
developed T . Winograd's (1971)
"PROGRAMMAR"
i s a t y p i c a l example ofprocedure oriented systems. In
this
system the s y n t a c t i c and othercomponents
can interact c l o s e l y i n the course of analyzing s e n t e n c e s . However, d e t a i l s
o f the program are lost i n the r i c h n e s s o f this i n t e r a c t i o n . LINGOL,
developed by V . Pratt ( 1 9 7 3 ) a t
MIT,
i s a language appropriate t o syntax-semantics interface and in which it is easy t o write grammars in t h e form of
r e w r i t i n g r u l e s . The TAUM group a t Montreal University ( 1 9 7 1 ) has e v o l v e d a
programming languagc name :I System-Q in which expressions of t r e e s , strings
and lists o f them can b e matched a g a i n s t p a r t i a l expressions ( s t r u c t u r a l
patterns) c o n t a i n i n g variables and can b e transformed i n any a r b i t r a r y
fast1 i 0 1 1 .
Thc augmcn t c d t r i 1 1 1 ~ i L I o n ilc twork (A1l'N) proposccl by
W
Woods (19 70)from our p o i n t of view gives an e s p e c i a l l y good framework for n a t u r a l
Xa~~guage a n a l y s i s systems. One of- t h e most a t t r a c t i v e features is t h e clear
dtscrimination between grammatical rules and t h e control mechanism. T h i s
enables u s t o develop t h e model by a d d i n g various f a c i l i t i e s t o its control
mechanism.
30
1.
It p r o v i d e s power o f e x p r e s s i o n e q u i v a l e n t t o t r a n s f o r m a t i o n a lgrammars
2. I t m a i 1 1 t a i . n ~ much of the r e n d a b i l i t y of c o n t e x t - f r e e grammars.
3 . R u l e s of n grammar can
b e
changed easily, sowe
canimprove
t h e mthrough a t r i a l - a n d - e r r o r p r o c e s s whi1.e w r i t i n g t h e grammar.
4. I t i s p o s s i b l e t o impose various types o f s e m a n t i c and p r a g m a t i c
c o n d i t i o n s on the b r a n c h e s between s t a t e s . By d o i n g this, c l o s e i n t e r a c t i o n s
between the s y n t a c t i c and
other
components c a n b e e a s i l y a c c o m p l i s h e d .However ATN h a s the f o l l o w i n g s h o r t c o m i n g s , e s p e c i a l l y when we a p p l y
i t to the p a r s i n g of Japanese s e n t e n c e s :
1.
It scans
words one-by-one
from
t h eleftmost
end of
an
i n p u ts e n t e n c e , c h e c k s t h e a p p l i c a b i l i t y of a r u l e , and
makes
t h e t r a n s i t i o n f r o m one s t a t e t o a n o t h e r .This
method may b e w e l l s u i t e d for E n g l i s h s e n t e n c e s ,but b e c a u s e t h e o r d e r of words and p h r a s e s i n Japanese s e n t e n c e s is r e l a t i v e l y
free, i t is preferable t o check t h e a p p l i c a b i l i t y o f a r u l e by a flexible
p a t t e r n - m a t c h i n g method. I n a d d i t i o n , w i t h o u t a p a t t e rn-matching mechanism,
a s i n g l e rewriting r u l e of a n o r d i n a r y grammar i s often to b e expressed by
several rules belonging to d i f f e r e n t s t a t e s i n Woods ATN parser.
2. An
ATN
model e s s e n t i a l l y performs a k i n d of top-down analysls of s e n t e n c e s . T h e r e f o r e r e c o v e r y f ronl f a i l u r e s in p r e d i c t f . o n is most d1.f f i c u l t .C o n s i d e r i n g these f a c t o r s , w e d e v e l o p e d PLATON ( a Programming
Lhnguage f o r Tree-Ol>cratioN), which is b a s e d on the A1'N model a n d has v a r i o u s
a d d i t i o n a l c a p a b i l i t i e s such a s pattern-matching, f l e x i b l e b a c k t r a c k i n g , and
s o on. As in S y s t e w Q and LINGOL, PLATON's p a t t e r n - m a t c h i n g f a c i l i t y makes
i t e a s y to write rewriting r u l e s . Moreover, i t extracts s u b s t r u c t u r e s from
31
These may be arbitrary LISP functions d e f i n e d by t h e u s e r , the arguments of
w h i ~ n are t h e extracted s u b s t r u c t u r e s .
A b a c k t r a c k i n g mechanism is a l s o necessary f o r l a n g u ~ e understanding
a s in other fields of a r t i f i c i a l intelligence. During t h e analysis, various
sorts of h e u r i s t i c information should b e u t i l i z a b l e . A t any s t a g e , analysis
based on criteria which m a y r e l a t e t o syntactic, semantic
or
contextual
considerat i o n s taken s e p a r a t e l y may be u n r e l i a b l e . The result which f u l f i l s
all the criteria, however, w i l l b e a c o r r e c t one.
The
program s h o u l d b edesigned s u c h that i t can choose the most satisfactory rule from many
c a n d i d a t e s according t o the criteria at hand. In further processing, i f the
choice
i s found t o b e wrong by other criteria, t h e program must be able t obacktrack
t o the p o i n tat
which the .relevant decision w a s made. In PLATON we can e a s i l y s e t up arbitrary numbers of d e c i s i o n p o i n t s in t h e program.Then,
i f subsequent p r o c e s s i n g r e s u l t s i n some failure, c o n t r o l w i l l comeback t o the points relevant t o t h e cause of the f a i l u r e .
11. Pattern-matching
Before proceeding t o the detailed description o f PLATON, w e w i l l
explain
t h e r e p r e s e n t a t i o n schema for i n p u t sentences and parsed trees. The process of analyzing a sentence, roughly speaking, may be regarded as theprocess of transforming an o r d e r e d list of words t o a tree s t r u c t u r e ,
which
shows
e x p l i c i t l ythe
interrelationships
of each wordin
the i n p u tsentence.
During the process, trees which correspond to the parts already analyzed, and
1 h t s
which have
not been processed y e t , may c o e x i s t t o g e t h e r i n a s i n g l e s t r u c t u r e . We therefore wish to represent such a c o e x i s t i n g structure o ftrees and lists. A l i s t s t r u c t u r e is a s t r u c t u r e i n which the order of element
root
node and
several nodes
which
are tied
t othe
rootnode
by
d i s t i n g u i s h a b l e
r e l a t i o n s . Because
relations between
the
root
and t h e
other
nodes are
e x p l i c i t l y
s p e c i f i e d ,
t h e
o r d e r
of
nodes
in
atree
is
changeable except
for
the
root node
which is
placed
in
the leftmost p o s i t i o n .
Different
matching
schemas
w i l l
be
a p p l i e d t otrees and
lists.
The
formaldefinition
of
suchcoexisting structures
is a s
f o l l o w s .
<structure>
i s
the
fundamental data-structure into which
a l l data processed
by PLATON
must
betransformed.
Hereafter
we refer
t o
t h i s
as the "structure"
The
formal
definition
of
<structure)
is:
(list)
::=($
tstructures>
)(structures>
: :=1
<structure>
(structures
>
<tree>
: := (node,
I
((node)
<
branches)
)<branches>
:
:=(brancl~)
1
(branch,
branches
)(branch)
::=( < r e l a t i o n >
(tree,
)(
node,
: := C l i s t>
I
!UBITRARY
LISP-ATOM
<
relation
>
: := ARBITRARY LISP-ATOMA
simple
example
is shown in Figure
1.
Coexisting
Structure
of
Trees and Lists
Corresponding Expression
i n
PLATON
[image:5.777.61.668.296.952.2]Two l i s t s which have the same elements but d i f f e r e n t orderings (for example,
( X
AB
C) and ($ AC
B))
,
s h o u l dbe
regarded as d i f f e r e n t s t r u c t u r e s . Onthe other hand, two tree structures s u c h as ( A ( R1
B
) (R2
C ) ) and( A (
R2
C ) (R1
B
) ) areregarded
as i d e n t i c a l . Besidesthe
usual rewriter u l e s which treat such s t r i n g s ,
s
tructural p a t t e r n s which c o n t a i n variableexpressions are permitted in PLATON. The PLATON-interpreter matches
structural patterns containing variable expressions a g a i n s t the structure
under process a n d checks whether the s p e c i f i e d p a t t e r n is f o u n d in it. At
t h e same time, t h e variables in the pattern are bound t o the corresponding
subs tructurzs
.
V a r i a b l e s i n patterns are indicated as
: X
(X
i s an arbitraryLISP
atom). Thefollowing
can be expressed by variables inthe
above definition(1) arbitrary numbers o f
<
s t r u c t u r e s >,
that i s to say, l i s t elements inthe definition of
<
list>
( F i g u r e 2,Ex.
1).
We can also specify theStructural Pat terns Structures R e s u l t s o f M a t c h i n g
Example 1
SUCCESS I n A :K
C C
Example 2
SUCCESS
[image:6.777.78.688.486.923.2]SUCCESS
3 4
number of l i s t elements by i n d i c a t i n g
variables
as :X+number. Forexample, t h e v a r i a b l e
:D2
w i l l match with two elements in a list.(2) a r b i t r a r y numbers of <branches
>
,
i n
t h e definition of (tree>
( F i g u r e 2,
Ex.
2).(3)
<
tree)
in
t h e
d e f i n i t i o n
of
<
branch)
(Figure2,
Ex.
3 ) .
We s h a l l fail such s t r u c t u r a l p a t t e r n s ( s t r u c t u r e - 1
>
.
By usingthe
same variable several times i na
p a t t e r n , we can express a structure i n w h i c h t h esame sub-structure appears i n two o r more different places.
The
c h a r a c t e r !i n
a l i s tindicates
that t h e n e x t e l e m e n tfollowing
t h e c h a r a c t e ri s
o p t i o n a l .
111
Basic
O p e r a t i o n sof
PLATONA
grammar, whether
g e n e r a t i v e o r a n a l y t i c a l . i s r e p r e s e n t e d as ad i r e c t e d graph
with labeled
s t a t e s andbranches.
There i s one s t a t e d i s tin-guished as the Start S t a t e and a s e t of s t a t e s c a l l e d F i n a l S t a t e s . Each
branch i s a rewriting r u l e and has the f o l l o w i n g e l e m e n t s :
(1)
applicability
c o n d i t i o n s of t h erule,
t y p i c a l l y r e p r e s e n t e das
astructural
p a t t e r n( 2 ) a c t i o n s
which
must
beexecuted,
i fthe r u l e
is
a p p l i c a b l e(3)
a s t r u c t u r a lp a t t e r n
i n t o whichthe
i n p u t s t r u c t u r e s h o u l d b e t r a n s f o r m e d .Each
s t a t e h a s
several branches ordered a c c o r d i n g
t o
the
p r e f e r e n c e of thorules.
When
the
control jumps t o a s t a t e , i t checks t h e rules associatedwith
the s t a t e one-by-one until it finds an a p p l i c a b l erule.
If
such a ruleis
found,the
i n p u t s t r u c t u r ei s
transformed
i n t o
a n o t h e r s t r u c t u r e s p e c i f i e d35
In
a d d i t i o n t o the above basic mechanism t h e system i s provided with push-down and pop-up operations.The
push-down o p e r a t i o n i s such that i n t h e process of a p p l y i n g a r u l e , several s u b s t r u c t u r e s are extracted from t h ewhole s t r u c t u r e by v a r i a b l e b i n d i n g mechanisms of pattern-matching. Then each
i s analyzed from a d i f f e r e n t state. The pop-up operation i s such that after
each substructure i s analyzed appropriately, control comes back to the
suspended r u l e and
execution
continues. Usills these operations, embedded s t r u c t u r e s can b e handled e a s i l y (See Figure 3).
F i p u r e 3 S t a t e Diapram
Table 1 shows the formal d e f i n i t i o n c f a grarmr o f PLATON (See follow-
i n g p a g e ) .
It
shows t h a t branches or rewriting rules in anATN
parsercorrespond to s i x - t u p l e s ( i e., ( p c o n )
,
< s t r x >,
dcon>,
(c
t r a n s r ),
( 4 a c t s > ) , dendb
.
< s t r x > corresponds to t h e left side of arewriting rule and describes the s t r u c t u r a l pattern t o which a rule is
applicable. ( strx
>
i s , by d e f i n i t i o n(1)
/
or
[image:8.781.78.673.169.699.2]TABLE 1 Formal Definition of Grammar in PLATON
( s t a t e s , :: = < s t a t e >
1
< s t a t e > 4 s t a t e s >4 rules)
. .
-
c r u l e > < r u l e s >- =
-
I
4 rule
>
. .
. .
= ( ( pcon) c s t r x , < c o n > ( <trans> ( 4 acts? tend 3 )<
trans>..
.
.
=I
<
t r a n s i 0 C trans,Iregister-name,
<
transit, : : = ( ( (state-name,
<structure-2> ) derrorps)4 variable-nam-
4
pros,..
.
.
--
< p r o >I
< p r o > 4 pros>4
pro>
..
..
= (EXEC < t r a n s > )I(TRANS
( (state-name> 4 s t s y > ) )( e n d '>
..
..
= (NEXT estate-name) <stry>I
(NEXTB 4 s ta te-name> rstry 2 )((POP
4 stry> )I
(FM
<
f a i l u r e - m e s s a g e ) .facts>..
. .
=I
4 a c t > < a c t s >< s e t
>
: : = <form,
I
(SR (register-name> d form>I
(SU 4 regis ter-name>
<
form>
)~ ( S D <register-name
>
cf
orm. )<
s t r x>
..
. .
= <structure-I>1
/
<s try,
..
. .
= <structure-2,1
1
dpcoTl),<con> : : = <form>
( form) : : = (GR .<register-name> )
I
(GV (variable- name>
?
~ ( T R
structure-2.
)I(TR
/ )
1
ARBITRARY
LISP
FORM<variable- :: =
: X
(X
isan
arbitrary LISP atom) name 7<register- : : =
/X
(X is an arbitrary
LISP
atom) [image:9.777.70.699.104.917.2]shows
t h a t a r u l e i s a p p l i c a b l e no matter what the s t r u c t u r e u n d e r process i s The v a r i a b l e s u s e d i n ( s t r u c t u r e - 1 ) are bound t o c o r r e s p o n d i n gs u b s t r u c t u r e s when m a t c h i n g succeeds. T h e r e s u l t s of
Example
1 (See Figure 2 ) i n d i c a t e that the v a r i a b l e:K
i s bound t o the s u b s t r u c t u r e (* ( B (R1 C))
D
)The
s c o p e of v a r i a b l ebinding
i s l i m i t e d t o w i t h i n t h e r e a l m of the p a r t i c u l a r r u l e . The same variablename
i n
d i f f e r e n t r u l e s has d i f f e r e n ti n t e r p r e t a t i o n s . I n t h i s sense, :X-type v a r i a b l e s i n
<
s t r u c t u r e - I > arec a l l e d
Local
Variables. On t h e o t h e r hand, w2 can s t o r e c e r t a i nkinds
ofre-
s u l t s
from
the
a p p l i c a t i o n ofr u l e s
in
r e g i s t e r s
and r e f e rback
t o them i nd i f f e r e n t r u l e s . These cons tftue v a r i a b l e s which we c a l l r e g i s t e r s . They
a r e r e p r e s e n t e d by t h e symbols / X
(X
is an a r b i t r a r yLISP
atom).Besides the pattern-matching,
<
peon> and<
con ) can . ~ l s o checkthe
a p p l i c a b i l i t y of arule.
Certain
p a r t s ofthe results
fromthe
applica-t i o n of p r e v i o u s rules are c o n t a i n e d i n r e g i s t e r s , n o t in the s t r u c t u r e .
We
can
check
the c o n t e n t s
of t h e s e r e g i s t e r s byusing
<
pcon>
-part functions
like GR,
GU, e t c .(these
f u n c t i o n s are l i s t e di n
T a b l e 2) and o t h e rLISP
functions
d e f i n e dby
theusual
LISPfunction,
DEFINE.
(Seefollowing
page for T a b l e 2 . )Semantic and c o n t e x t u a l c o - o r d i n a t i a n between s u b s t r u e t u r e s can b e
checked by using appropriate functions in
the
(con > - p a r t o f a r u l e .Semantic and contextual analyses
cannot
be
expressedin
t h e formof
s i m p l er e w r i t i n g
rules.
These
analyses havediffering
r e q u i r e m e n t s s u c h a s l e x i c a linformation
about wordswhich
mayin
turn represent knowledgeof
t h eworld
and contextual
i n f o r m a t i o n whichmay.
express t h e s t a t e oft h e
world.We
canTABLE
2 Functions ofPLATON
w
Func t ion
SR
SV
GR
GV
Argument
(regis ter-name>
LISP
-
<
form>4 var i a b le-nam-
LISP
-
4 form>(regis ter-name>
( v a r i a b le-name>
E f f e c t
SR s t o r e s the r e s u l t of the e v a l u a t i o n of t h e 2nd argu- ment i n the r e g i s t e r .
SV s t o r e s t h e
result of
t h e e v a l u a t i o n of t h e 2nd argu- ment i n the v a r i a b l eGR g e t t h e content of the r e g i s t e r
GV g e t s t h e value of t h e
variable
TR
SU
SD
GU
PUSHR
4 Value.-
-t h e r e s u l t of the e v a l u a t i o n of t h e
2nd
argumentA
the r e s u l t
of
the e v a l u a t i o n of the 2nd argumentthe c o n t e n t of the r e g i s t e r
the
value
ofthe variable
I
<structure-2, o r/
,
rregis ter-name,
LISP
-
4form)
Cregis ter-name
LISP
-
<form>(register-name>
4 regis ter-name
>
LISP
-
4 form), TR transforms t h e v a r i z b l e s
and r e g i s t e r s i n t h e s t r u c - t u r a l p a t t e r n i n t o t h e i r v a l u e s .
SU sets
the reigster
of
t h e-
higher level p r o c e s s i n gSD sets t h e r e g i s t e r of t h e
the transformed s t r u c t u r e
the result of the e v a l u a t i o n of t h e 2nd argument
I
lower l e v e l processing.
GU g e t s t h e c o n t e n t of the
register
of t h e higherlevel.
PUSHR
is defined as the following.(SR r regis ter-name>
(CONS
4 form,(GR < r e g i s t e r - name3
1 1)
the r e s u l t of t h e e v a l u a t i o n of t h e
2nd argument
the
content of the r e g i s t e rthe result of the
[image:11.777.72.702.150.951.2]For example, s u p p o s e
s t r x = ( ADJ (
TOK
:N
) ) (N(TOK
: N 1 ) ) : I )con = ( SEM :N : N 1 )
Here TOK i s the l i n k between a word and i t s p a r t o f speech. :N and : N 1 are
t h e words o f an i n p u t s e n t e n c e . SEM i s
a
f u n c t i o n d e f i n e d by t h e u s e r whichc h e c k s t h e s e m a n t i c c o - o r d i n a t i o n between t h e a d j e c t i v e
:N
and t h e noun : N 1 . By t h i s f u n c t i o n SEM, w e can s e a r c h , i f n e c e s s a r y , t h r o u g h b o t h l e x i c a lentries and the contextual d a t a bases.
U i t h t h i s a p p r o a c h , i f a c e r t a i n s y n t a c t i c p a t t e r n is found i n t h e
input s t r u c t u r e , i t i s p o s s i b l e f o r an a p p r o p r i a t e semantic r u n c t i o n t o b e
c a l l e d . Hence t h e i n t i m a t e i i i t c r c c t i o n s between s y n t a c t i c and s e m a n t i c
components can b e o b t a i n e d easily w i t h o u t d e s t r o y i n g the clarity o f n a t u r a l
language grammars.
A r b i t r a r y LISP-forms can be a l s o used i n < a c t > - p o r t i o n . They w i l l
b e e v a l u a t e d when t h e r u l e i s a p p l i e d . I f n e c e s s a r y , w e can s e t i n t e r m e d i a t e
r e s u l t s i n t o r e g i s t e r s and v a r i a b l e s by u s i n g t h e f u n c t i o n s l i s t e d i n T a b l e 2
( e n d
>
c o m p r i s e s f ~ u r v a r i e t i e s , and r u l e s are divided i n t o fourt y p e s a c c o r d i n g t o t h e i r ( end
>
t y p e s .1.
NEXT-type
: The<
end+ i s i n t h e form (NEXT d s t a t e - n a m e , 4 s try> ).
The
b s t r y ) c o r r e s p o n d s t o t h e r i g h t s i d e o f a r e w r i t i n g r u l e , andr e p r e s e n t s t h e t r a n s f o r m e d s t r u c t u r e . A r u l e o f this t y p e c a u s e s
s t a t e - t r a n s i t i o n t o t h e d s t a t e - n a m e )
,
when i t i s a p p l i e d .2. NEXTB-type: This r u l e a l s o c a u s e s state-transition. Unlike w i t h t h e
NEXT-type, s t a t e - s a v i n g is done and i f f u r t h e r p r o c e s s i n g r e s u l t s i n
some f a i l u r e s , c o n t r o l comes back t o t h e s t a t e where t h i s r u l e i s a p p l i e d .
The e n v i r o n m e n t s , t h a t i s , t h e c o n t e n t s of v a r i o u s r e g i s t e r s w i l l be
40
3. POP-type :
The
(end>
-part of t h i s type is in the form (POP < stry
>
)When
i t i s applied, t h e processing of this l e v e l is ended a n d thec o n t r o l r e t u r n s t o t h e h i g h e r l e v e l with t h e v a l u e s t r y
>
.
4 .
FM-type: The < e n d ) - p a r t of this t y p e i s i n t h e form (FM <failure-message) ) . The s i d e e f f e c t s of he processing a t t h i s l e v e l , t h a t i s ,
r e g i s t e r s e t t i n g s and s o on, a r e c a n c e l l e d ( s e e s e c t i o n
4 ) .
I n
< s t r y>
we can use two kinds ofvariables,
t h a ti s ,
the variables used i n d s t r x>
and r e g i s t e r s . We f i n d t h i s s t r u c t u r a l p a t t e r n , c a l l e d4 s t r u c t u r e - 2
>
,
more s u i t a b l e f o r w r i t i n g t r a n s f o r m a t i o n a l r u l e s t h a nWoods BUILDkoperation. By way of i l l u s t r a t i o n c o n s i d e r the following:
i n p u t s t r i n g =
C D E ( A ( R l ( *
B ) ) ) F G )strx
=(8
:I
( A (Rl :N
) ) : J )StrgT = (* (A (
F
U
(* :I:N
) ) (R2
/REG
) ) :J)t h e c o n t e n t of /REG = ( G (
R3
H
) )A s t h e r e s u l t of matching, t h e v a r i a b l e s :I,
:N
and:J
a r e bouad t o t h es u b s t r u c t u r e s
(&
C
DE
) ,($
B)
and(3
F
G ) r e s p e c t i v e l y .The
r e s u l t of e v a l u a t i n g t h e<
s t r y>
i s( ( A (
R1
(#C
D E
B ) ) ( R2 (G
(R3
H
) ) ) )F
G
) .I f
the
r u l ei s
a
POP-typeone,
then this
s t r u c t u r e w i l lbe
returned
t o t h e h i g h e rlevel
processing. If i t is NEXT- or NEXTB-type, then the c o n t r o l w i l l move t o t h e s p e c i f i e d s t a t e with t h i s s t r u c t u r e .IV
Push-downand
Pop-up OperationsBy means of NEXTB-type r u l e s , w e can s e t up decision p o i n t s i n a
(
strx>
;and
a t the
same time,
e x t r a c t s substructuresfrom
the
input
st'king
From the
structural
d e s c r i p t i o n
i ti s
predicted
that
the
substructures
may
have particular
constructions,
thati s ,
t h e y
may
comprise
noun phrases,
relative
c l a u s e s
or
whatever.
It
is-necessary
to transfer the s u b s t ~ r ~ c t u r e s
t o s t a t e s
appropriate f o r
analyzing
these
constructions
predicted
and
to return
the
analyzed
structures
back
into the
appropriate places
In PLATON,
these
operations
can
bedescribed
in
the
C t r a n s
>
-part
o f
a
rule. For example,
suppose
the <trans
>-part
of arule is
( ( (
Sl
:K :K
) ) ( (S2
( :I : J ) / R E G ) ) )When
the
cantrol interprets
t h i s
statement,
the
substructures corresponding
t o
the
v a r u b l e:K
and (:I
: J ) are transferred t othe
s t a t e sS1
and S 2respectively
If
the p r o c e s s i n g s s t a r t i n g
from
theses t a t e s are normally
completed (by
a POP-type r u l e ) ,then
the
r e s u l t s arestored
i?
the variable
:K
and the
regzster
/REG.
In
this manner,
bymeans
ofthe
push-downand
pop-up
mechanisms,
substructures
can beanalyzed
from appropriate states.Processing
from k e s e s t a t e s , however,may
sometimes resultin failure,
That
i s ,
predictions that c e r t a i n relationshipsw i l l
b efound
among
t h eelements
o f substructures maynot
be
f u l f i l l e d . In such i n s t a n c e s the pusheddown
s t a t ew i l l
send an
error-message appropriate t othe
causeof
the
failureby an W t y p e r u l e . An FM-type rule
points
o u t
that a certainerror
has occurredin
the processing.If
NEXTB-type rules wereused
i n
the previousprocessing
a tthis
l e v e l ,control
w i l l
goback t o the most
recently
usedNEXTB-type r u l e .
If
NEXTB-typerules
were
not
used
at t h i s processing
l e v e l ,tne error-message s p e c i f i e d by the FM-type
rule
w i l l be sent t othe
< t r a n s >
H i g h e r level
Lower l e v e l p
processing
---
/
/ \
/ \
I \
/ \
r o c e s s i n g ~ \
- - - I
-
I
\
I
\
i
\
/
NExTB-type rules weren o t a p p l i e d in this l e v e l
Lower level processing
c----
----
[image:15.777.71.681.126.876.2]/This r u l e will b e applied n e x t .
43
According t o
these
error-messages,
control-flow can
be
changed
appropriately. For
example, w e can d i r e c t
processings
by
describing
the
4
trans
>
-part i nthe
following
way.( ( (
s l
:K
:K
) (ERR1
(EXEC
( (S5
:K
:K
1)
( (56
(:I
:J )/REG
) ) ) )( ERR2 ( T U N S ( S8
/ ) ) I
( ( S 2
(j6
: I:J
)/REG
) ) )I n t h e above example, the processing of the substructure
:K
from the s t a t eS1
will
produce one of t h e f o l l o w i n g three r e s u l t s . According to thereturned value, the a p p r o p r i a t e step w i l l b e taken:
(1)
Normal
return:
the
processingof
:K
is
ended by a POP-typerule.
The r e s u l t i s s t o r e di n
thev a r i a b l e
:K
and
the n e x t push-downperformed, that i s : I : J ) w i l l b e transferred t o the s t a t e
S2.
(2) Return w i t h
an
error-message: t h e processingof : K
r e s u l t sin
a f a i l u r e and an FM-type r u l e sends up an error-message.
If
t h e message isERR1, them
:K
and :f:J
) w i l l b e analyzed from the s t a t e s S5 and S 6respectively (EXEC-type).
If
i t i s ERRS, the i n t e r p r e t e r w i l l give up t h eapplication o f the p r e s e n t r u l e , and pass
the
control to another s t a t e S8(TRANS-
t y p e ) .If
it is neither ERRl n o r ERR2, t h e same s t e p as ( 3 ) w i l lbe
taken( 3 )
Re
turn w i t hthe
value
NIL:the
processing Prom t h e stateS1
w i l l send up
the
valueNIL if
it runs into a blindalley,
that is, there are no a p p l i c a b l e r u l e s . The interpreter w i l l give up the a p p l i c a t i o n of the p r e s e n trule and proceed to the next rule attached t o t h i s s t a t e .
Mechanisms, such that control flow can be appropriately changed
according to the error-messages from lower level p r o c e s s i n g s are n o t found in
Woods A T N parser, We can o b t a i n flexible backtracking f a c i l i t i e s by combining
V
A S i m p l eExample
We are now d e v e l o p i n g a d e d u c t i v e question-answering sys tern with
n a t u r a l language i n p u t s
--
J a p a n e s e sentences. The i n t e r n a l data-base i sassumed t o be a s e t o f deep case s t r u c t u r e s o f i n p u t s e n t e n c e s . We a d o p t e d
and m o d i f i e d Fillmore's (1968) case grammar t o analyze the i n p u t of J a p a n e s e
s e n t e n c e s . Japanese i s a t y p i c a l example o f a n
SOV-language
i n which t h e o b j e c t and o t h e r c o n s t i t u e n t s governed by a verb usually appear b e f o r e :heverb i n s sentence. A typical c o n s t r u c t i o n of a Japanese sentence i s shown
in
F i g u r e 5 .F i g u r e 5 T y p i c a l Construction of a J a p a n e s e S e n t e n c e
A
verb
maygovern
several
noun phrases
precedingit.
Arelative
clause
modifying a noun may a p p e a r i n t h e form
--
verD+
noun--
The r i g h tboundary of t h e clause
is
easilyidentified
by f i n d i n g theverb.
The left boundary i s o f t e n much more d i f f i c u l t t o i d e n t i f y .I n
F i g u r e 5, t h e nounphrase N P k r i s a c a s e element of t h e verb V On the o t h e r hand, t h e noun
phrase NPi is governed by t h e verb
Va
Because t h e r u l e of p r o j e c t i o n s holdsi n JaFnese as i n o t h e r languages, a l l
the
noun phrases betweenNP
c'+
I andV are governed by V
,
andthe
noun phrases b e f o r eNP,:
are
governed byVA
However, i n the cou.rse of a n a l y s i s , s u c h b o u n d a r i e s c a n n o t be determineduniquely.
The
a n a l y s i s programfixes
a temporary boundaryand
proceeds t othe next s t e p i n p r o c e s s i n g I f the temporary boundary i s not c o r r e c t , the
[image:17.777.166.515.424.518.2]a t
which
the temporary
boundary
wasf i x e d .
Now
wew i l l show
asimple
example
of
structural
a n a l y s i s
by
PLATON
The
example explains how the
backtracking
f a c i l i t y
i s
used i n analyzingJapanese
sentences.Because
we
want t o v i s u a l i z e t h eoperations
ofPLATON
w i t h o u t
b o t h e r i n g
w i t h
microscopic d e t a i l s
of
Japanese
sentences,
wew i l l
take
animaginary problem
as
an
example.
The parser
which
is
w r i t t e n
in
PLATON
is
described
in
another
paper
by
M
Nagao
and
J. T s u j i i
(1976)
An
input
s t r i n g
is assumed
to
be
al i s t . The elements
of
t h elist
are
i n t e g e r s andtrees are
i n
the
form
of
( X (SUM 0)). Here'X'
may
b e
regarded as a term m o d i f i e d
by
'SUM
0'These
twokinds
of elements arearranged
i n
an arbitraryorder, except that
the
last
element i sthe
tree(X(SUM
0)).The
f o l l o w i n gis
an
example
of
a n
i n p u t s t r i n g :( 5 2
1 3
(x
( S r n 0)) 31
(X
(SUM 0)) 2 2 (x
(SUM 0)) )F i g u r e
6
An Example S t r i n g t o b eAnalyzed
The
result
of thetransformation
i s expected tobe
i n
the
f o l l o w i n g fotm:( ( X
( S m 4 ) )
( X S m 6 ) )
(X
(Sm9))
1
This
result
i sregarded
as representing thefollowing
relationships betweenintegers
and
'X'
.
The number associated w i t h an
'X'
by the r e l a t i o n'SUM'
shows thesum
of
the integers which are governed by theX
.
We can look upon the relationsr
between integers and an X' as the r e l a t i o n s between noun phrases and t h e verb
i n Japanese sentences.
The
r e s u l t of the a n a l y s i s is assumed t o satisfy46
(1) Governor-governed r e l a t i o n s h i p s between i n t e g e r s and an
'X'
must obeythe
p r o j e c t i o nrule
(i.e., c l a u s e s d o n o t o v e r l a p ) .( 2 ) A s a s i m u l a t i o n o f a semantic r e s t r i c t i o n , w e attach a c o n d i t i o n that
the sum o f the i n t e g e r s governed by an ' X ' should n o t erceed t e n .
( 3 ) As a s i m u l a t i o n of a c o n t e x t u a l r e s t r i c t i o n , w e attach the c o n d i t i o n
t h a t a r e s u l t
(F
( X(SUM
num-1))(X
(SUM num-2)).
.
.
.
( X (SUM num-N)) ) s h o u l d maintain the r e l a t i o n , n u m l4
num-2-
5
. . . ..
-
< u m - N .s e t 05 r u l e s i s shown i n the following. The c o r r e s p o n d i n g state- diagram i s shown i n F i g u r e 7 .
START
NEXTB NEXTB
POP
NEXT
S L ? -1-
s t r x :
=:I
:I1 ( X(SUM
:N))
:J)
con: = ( GREATERP
10
(PLUS :N
: 11) ) a c t : = ((SV
:N
(PLUS:N
:I1
) )(PUSHR /REG
: I l l
)end: =
(NEXT
SUMLmC)c
:I
(X
(SUM
:N)) : J) )-2- s t r x : = (* :I
(X
(SUM :N)) r J )con: = (CONTEXTCHECK /RESULT
(TR
(X (SUM
:N))))
a c t : =NIL
end: =
(NEXT BACKTRACK
/)-3- strx: =
(%
:I
(X (SLW :N))
:J)
con: =:
T
act: =
NIL
end: = (FM-ERROR)
-4- s t r x : =
( 8
)con: =
T
a c t : = ( (SR /RESULT (CONS ' X /RESULT ) ) )
end: = (POP /RESULT)
BACKTRACK
-1- s t r x : = :I (X (SLW :PI)) :.J)
con: =
T
a c t : = ( (SR /REG N I L )
(SR /RESULT
(APPEND
/RESULT (TR
(X (SUM :N))))) ) end: = (NEXTB SUMUP (S# :I :J ))-2- s t r x : = (* :I ( X (SUM :N))
:J)
con: = Ta c t : = ( (POPR /TEMP /REG)
(SV :N (MINUS :N /TEMP)) )
The i n p u t
string i s the! l i s t shown i n F i g u r e 6. Since the start s t a t e is S W P , the f i r s t r u l e attached t o this s t a t e i s a p p l i e d . T h i s r u l e w i l lfind the l e f t m o s t
'x'
and an integer just b e f o r e t h e' x '
( b y SUMUP -I-, strx).The
v a r i a b l e :I1 i s bound to tius i n t e g e r . This i n t e g e r i s added t o t h e sum of the i n t e g e r s , :N, i f the t o t a l does n o t exceed t e n (SUMUP -I-, c o n ) .PUSHR, used i n t h e
<
a c t>
- p a r t , i s a PLATON f u n c t i o n which p u t s t h esecond a r g u m e n t on t h e head of the f i r s t argument
( S U M U P
-I-, a c t ) Aftert h i s r u l e is a p p l i e d , t h e c o n t r o l will enter the s t a t e SUMUP a g a i n ( S W
-I-,
end). That is, t h i s rtile is a p p l i e d u n t i l there a r e no ~ n t e g e r s before the
f i r s t
'
X' o r the sum of the i n t e g e r s . exceeds t e n . A s theresult,
the environment i s the following:s t r u c t u r e under p r o c e s s i n g
=
qY
5 (X (SUM 6 ) ) 3 1( X
( S U M 0)) 2 2( X
( S U M 0)) )r e l a t i o n s h i p t e m p o r a r i l y fixed be tween i n t e g e r s and
'X'
c o n t e n t of /REG
= ( 2 1 3 )
The
second r u l e of SUMUP w i l l be a p p l i e d n e x t . T h i s rule checks by its1 con> p a r t whether the r e s u l t a t hand s a t i s f i e s the t h i r d condition, t h a t i s , t h e c o n t e x t u a l r e s t r i c t i o n . Because t h e c o n t e n t of /RESULT is
NIL,
t h e f u n c t i o n CONTEXTCHECK r e t u r n s the v a l u e T (SllMUl' -2-, con). So t h i s
r u l e i s a p p l i c a b l e . C o n t r o l makes t h e s t a t e - t r a n s i t i o n t o the s t a t e
BACKTRACK
(SUMUP -2-, end.) Because t h e first r u l e ofBACKTRACK
i s a NEXTB-type y u l e , s t a t e - s a v i n g i s performed. That i s , tbe f o l l o w i n g environment i s
saved: c o n t e n t of /REG = ( 2 1 3)
c o n t e n t of
/RESULT
= NILs t r u c t u r e under processing =
49
By t h i s
r u l e s ,
the
registers /REG
and
/RESULT are
s e t asfollows
( BACKTRACK-I-,
a c t ) ./REG
: =NIL
/RESULT: = ( (
X
(SUM6
) ) )And t h e structure is transformed t o
(X
5
3
1 (X (SUM
0))
2
2(X
(SUM 0)) )A
XEXTB-
t y p erule
causesa
s t a t e t r a n s i t i o n a s does a NEXT-typerule.
Socontrol returns
to
thestate
SUMUP (BACKTRACK-I-,
end)
.
At
tt.is
s t a t e ,a
process
s i m i l a r
t o t h eone
described above
is
performed.As
aresult, the
following governor-dependent
relationships
are est a b l i s h e d .
Here
t h e
b o l dl i n e s
i n d i c a t e t h enewly
e s t a b l i s h e d
r e l a t i o n s h i p s .By
thef i r s t
rule of
BACKTRACKthe
following
environment
is
saved.
c o n t e n t
o f/REG
= (5
31
)conten& of
/RESULT
= ( (X
(SLW 6
) ) )structure under processing
=(#
(X
(SUM
9))
2 2
(X
(SUM
0)) )And /REG
and
/RESULT are set asthe
following
(BACKTRACK -1-,
act)
.
/REG: = N I L
/RESULT: = ( ( X ( S U M
6))
(X (SUM
9 ) ) )The
transformed
structure
is
(BACKTRACK-10, end)
Cjl
2
2
( x
( S U M O ) )1
The
control
is
transferred
to
t h e s t a t e SUMUP. By a p p l y i n g t h e f i r s trule
of
t h i s
s t a t e r e p e a t e d l y on t h e above structure t h e f o l l o w i n g structure iso b t a i n e d .
50
However
this
result does not s a t i s f yt h e
contextual restriction.
So the application of t h e second r u l e of SLJMl.JP f a i l s because t h e f u n c t i o n
CONTEXTCHECK
used i n <con ) - p a r t r e t u r n s the valueNIL
(SUMUP-2-,
con)That
is :con textcheck [ (
(X
(SUM 6 ) )(X
(SUM9 )
) ) :(X
( S U M 4 ) )1
= N I LThe
third rule,
therefore, w i l l b e a p p l i e d next. Because t h i s rule i s a %type r u l e ( S U M U P -3-, e n d ) , it c a u s e s an error and c o n t r o l comes back t ot h e
p o i n t at which a NEXTB-type r u l e w a s a p p l i e dmost
recently.The
saved n v i r o ~ m e n t is r e s t o r e d .This
is:/REG: = ( 5 3 1 )
/RESULT: = (
(X
(SUM6 ) )
)s t r u c t u r e under processing: = (*
(X
(SUM 9 ) )2
2(X
(SUM 0)) )Then by applying the second rule of
BACKTRACK,
the governor-governedr e l a t i o n s h i p e s t a b l i s h e d l a s t l y in t h e p r e v i o u s p r o c e s s i s c a n c e l l e d .
The
s t r u c t u r e and t h e r e g i s t e r / R E G are changed as below(BACKTRACK
-2-, a c t ) :/REG:
= ( 3 1 )s t r u c t u r e under p r o c e s s i n g : ( 5 ( X
(SUM
4))
22
(X
(SUM
0 ) ) ) C o n t r o l e n t e r s t h eBACKTRACK
state again. The a p p l i c a t i o n ofthe
first
r u l e saves the e n v i r o n ~ w n t :content
of/REG
= ( 31
)content of /RESULT = (
(X (SUM
6 ) )s t r u c t u r e under processing =
(1<
5
(X
(SUM
4))
22
(X
(SUM
0 ) ) ) That i s , t h e r e l a t i o n s h i p i n d i c a t e d by t h e d o t t e d l i n e i n the f o l l o w i n g i sControl t r a n s i t s t o t h e s t a t e SUMUP
(BACKTRACK
-I-, e n d ) and a similar process i s performed. However, because the governor-governedrelationship between t h e i n t e g e r 5 and t h e s e c o n d
'Xr
i s c a n c e l l e d , t h e sumof the i n t e g e r s governed by t h e f i r s t
'X',
( 2 1 3 ) , i s g r e a t e r than t h a tof the second
'
X'
,
( 3 1 ).
The contextual c o n d i t i o n , t h e r e f o r e , i s n o t fulfilled, and the a p p l i c a t i o n of the second r u l e of SUMUP will n o t s u c c e e d .So the t e m p o r a r i l y established relationships w i l l be c a n c e l l e d one-by-one as
follows.
A f t e r these r e l a t i o n s h i p s have been c a n c e l l e d , the desired r e s u l t i s o b t a i n e d