A Framework for the Development of Natural Language Grammars
Massimo MARINO
Department of Linguistics
University of Pisa
Via S.Maria 36 1-56100 Pisa - ITALY
Electronic Mail: MASSIMOM0ICNUCEVM.BITNET
Abstract
This p a p e r d esc rib e s a parsing s y s te m u sed in a fram ew ork fo r the developm en t o f Natural
Language gram m ars. It is an interactive environm ent su itable fo r writing robust NL applications
gen erally. Its heart is the SAIL parsin g algorithm that u se s a Phrase-Structure Gram m ar with
exten sive augm entations. Furthermore, som e particular parsing tools are e m b ed d e d in the system ,
an d provide a pow erful environment fo r developing gram m ars, even o f large coverage A
1. Introduction
Every parsing system should embed a set of tools or m echanism s which should provide an aid
In treating a m inim um set of linguistic phenom ena. D esigning SAIL we have mainly taken into
account the generality of the parsing system in order to give a wide freedom to the grammar designer,
so as to investigate many possible solutions in grammar design in order to adopt the best of them.
SAIL (System for the Analysis and Interpretation of Language) Is the parsing algorithm of the SAIL
Interfacing System (SIS) (/Marino 1 9 8 8 a /. /Marino 19 88b /, /M arino 19 8 9 /), and Just because of Its
features of generality the design has been driven by some general aspects which derive from various
theoretical as well as com putational accounts.
1. Whatever representation is adopted for the structure of the parsed sentences, it is agreed that
com plex sets of syntactic a n d /o r sem antic features m ust describe the linguistic units. Therefore, it Is
necessary to provide feature handling m echanism s. This point has suggested to u s a way of providing
a very rich language for handling feature structures (FS in the following). FSs are represented as
trees where each arc is labelled by an attribute, and nodes can be pointers to the following
alternative paths or a pointer to a leaf node where the value for the path spanned so far Is found.
They can store m any kinds of information thanks to their efficient processing provided by a core set
of functions.
2. Some linguistic phenom ena encountered In parsing NL, su ch as long-distance dependency or
the ability of treating som e context-sensitive cases, led u s to see the SAIL grammar rules as processes
executed by a processor, a role covered by the parser. The rules of a grammar have associated some
information related to their statu s of processes which are scheduled In a priority queue, according to
som e their priority of execution (/K nuth 1 9 7 3 /, /A ho et al. 1 9 83 /). This also allows, for Instance,
that the execution of som e rule can be requested to perform context-sensitive recognition, or some
rules can exchange between each other some information under the form of m essages to perform the
treatment of long-distance dependency.
3. The parser is structured as a bottom -up (shift reduce) all-paths algorithm, and a formalism
for the gram m ar ru les w as defined to allow sy n ta ctic p rocessin g in parallel with sem antic
p ro cessin g . The gram m ar o f SAIL is a P h rase-S tru ctu re G ram m ar (PSG) w ith exten siv e
au gm en tatio n s, so th at we also take advantage from the com p ositlon allty principle naturally
em b ed d e d In b o tto m -u p p arsers. As m en tio n ed ab ove, th e p arser is se e n a s a p rocessor, th u s o n e o f its
m a in ta s k s is to s c h e d u le the p r o c e s s e s /r u le s to ru n in a priority q u e u e . T h is q u e u e is n ot co m p le te ly
u n d e r co n tro l o f th e p a rse r s in c e th e g ra m m a r r u le s an d th e d ictio n a ry c a n a lso is s u e s o m e sp e c ific
o p e r a tio n s or r e q u e s ts a b o u t th e m a n a g e m e n t o f th e s c h e d u lin g task .
4 . T h e n eed o f a flexib le fro n t-en d for th e u s e r is o f p rim ary im p o r ta n c e to p rovid e a p ow erfu l
a n d c o m p le t e d e v e lo p m e n t e n v ir o n m e n t. T h e u s e r in te r fa c e b u ilt o v er SAIL, th e S IS . is th e
fra m ew o rk w h e r e a u s e r c a n in te r a c t w ith th e u n d e r ly in g p a r sin g s y s t e m in d e v e lo p in g g r a m m a rs.
T h is in te r fa c e p ro v id es a s e t o f c o m m a n d s, d efin e d b y m e a n s o f a s e m a n tic gram m ar, th a t are c a u g h t
a n d p r o c e sse d b y SAIL an d ca n h a n d le m a n y p o ssib le r e q u e s ts o f th e u ser.
In th e fo llo w in g s e c tio n w e give a b rie f d e s c r ip tio n o f th e g r a m m a r a n d d ic tio n a r y fo rm a t a n d
h o w a g r a m m a r is d e fin e d in SAIL. S e c tio n 3 g iv e s a n o v er v ie w o f th e SAIL p a r sin g s y s te m , p a rse r
o r g a n iz a tio n , a n d d a ta s t r u c t u r e s It u s e s . S e c tio n 4 d e s c r ib e s th e p a r s in g t o o ls a v a ila b le in th e
s y s t e m a n d th e ir p u r p o s e s . F in ally, s e c tio n 5 s h o w s J u st o n e e x a m p le o f a g ra m m a r fra g m en t w h ere
s o m e p a r sin g to o ls d e sc r ib e d in th e p rev io u s s e c tio n s are u sed .
2. The SAIL Grammar
The Grammar Format
T h e f o r m a lis m w e a d o p t to e x p r e s s g r a m m a r r u le s , c a lle d C o m p le x G r a m m a r U n it (CGU),
d e f in e s a s y n ta c tic a n d a s e m a n tic s id e c a lle d s y n ta c tic ru le a n d s e m a n tic ru le , r e s p e c tiv e ly . T h e
s y n ta c tic ru le c o n ta in s th e p ro d u c tio n , th e te s ts , th e a c t io n s a n d th e re co v e ry a c tio n s . T h e s e m a n tic
r u le c o n t a in s th e s e m a n t ic c o u n te r p a r t o f th e s y n ta c tic t e s t s a n d a c t io n s . T h e p r e s e n c e o f th e
s y n t a c t i c / s e m a n t i c r e c o v e r y a c t io n s is a v e r y p o w e r fu l m e a n to u n d e r ta k e a lte r n a t iv e a c t io n s
w h e t h e r th e r u le fa ils e ith e r m a tc h in g th e r ig h t -h a n d s id e o f th e p r o d u c tio n or c h e c k in g th e
s y n t a c t i c /s e m a n t ic t e s t s . In t h is w a y th e r u le s n e e d n o t to b e c r u d e ly r e jected w h e n th e y fall b u t. for
in s t a n c e , th e y c a n a c tiv a te o th e r r u le s th a t c o u ld b e a p p lied s u c c e s s f u lly .
A ru le in SAIL is w r itte n d e fin in g all th e p r e v io u s C G U 's ite m s . In a d d itio n , it Is a ls o n e c e s s a r y to
p rovid e th e s t a tu s o f th e r u le /p r o c e s s , so th a t It c a n b e p rop erly ta k e n in to a c c o u n t b y th e p arser. T he
s t a t u s s a y s w h e th e r a ru le c a n b e s c h e d u le d for a p p lic a tio n or n o t b y th e p a rser. It c a n b e
a ctiv e
orin a c tiv e .
A ctiv e r u le s a lw a y s are s c h e d u le d b y th e p a rse r, w h e r e a s in a c tiv e r u le s a re n o t (in a ctiv e r u le s c a n b e s e e n a s s le e p in g ru le s). T h e s t a t u s p la y s a c e n tr a l role in th e o r g a n iz a tio n o f a gram m ar.A s a n e x a m p le , if a ru le d e t e c ts s o m e righ t or w ro n g c o n d itio n s in th e p a r sin g s tr u c tu r e it c a n e ith er
s e t a c tiv e or a c tiv a te a n in a c tiv e ru le.
S u m m a r iz in g , a g r a m m a r ru le is c o m p o s e d o f th r ee m a in item s: 1) th e s t a tu s :
a ctiv e
orinactive);
2) th e p r o d u c tio n in c o n te x t-fr e e (CF) form at, in th e follo w in g d e n o te d b y A <— w^ ... w n . n >1. w h e r e th ele ft-a r r o w
m ean s
t h a t th e le ft - h a n d s id e is r e d u c e d from th e r ig h t -h a n d s id e a c c o r d in g to th e b o tto m -u p strategy o f p arsin g; 3) th e a u g m e n ta tio n s .T h e p r o d u c tio n is a u g m e n te d w ith a n a d d itio n a l item , c a lle d th e s o n -fla g lis t. T h is lis t s a y s for every
c a te g o r y in th e r ig h t-h a n d s id e w h e th e r th e c o r r e s p o n d in g n o d e m a tc h e d in th e p a r s in g s t r u c t u r e
m u s t b e c o n s id e r e d a s a s o n o f th e le ft-h a n d s id e or n o t. If a s o n -fla g is s e t to +■ for a r ig h t-h a n d sid e
ca te g o r y th e c o r r e sp o n d in g m a tc h e d n o d e is a s o n o f th e le ft-h a n d s id e n o d e , o th e r w is e it is n o t a so n
n o d e if th e flag is -. W e h a v e tw o t y p e s o f p r o d u c tio n d e p e n d in g o n its s tr u c tu r e : C F a n d c o n te x
t-s e n t-s itiv e (CS) p r o d u c tio n t-s. CF p r o d u c tio n t-s, r e p r e t-se n te d b y A <— w j ... w n , are d e fin e d like:
(A
(W! ... w n )(+
... ♦))
w h e r e a ll n o d e s m a tc h e d b y th e r ig h t -h a n d s id e m u s t b e s o n s o f th e le ft - h a n d s id e n o d e . C S
(A (ci ...CpW j ...w n Cp+i ... Cq)
(-
...
-
... -))
w h e r e o n ly th e n o d e s w ith a p lu s flag in sid e a c o n te x t o f m in u s-fla g g e d n o d e s are s o n s o f th e
left-h a n d sid e n ode. ^
T h e a u g m e n t a tio n s co v e r th e s y n ta c tic a n d s e m a n tic t e s t s a n d a c tio n s o f th e CGU m o d el. T h ey are
th e b o d y o f a ru le a n d are p ie c e s o f Lisp cod e e x e c u te d b y th e p arser d u r in g th e a p p lica tio n o f th e rule.
S t a t u s , p r o d u c tio n a n d a u g m e n t a tio n s is th e in fo r m a tio n p rovid ed b y th e g ra m m a r w riter for every
ru le o f a gra m m a r. A ru le is a n a m e d in s t a n c e o f a c o m p le x d a ta s tr u c tu r e d efin e d a c c o r d in g to the
fo llo w in g
d efru le
form at:(defrule
■:gname
gnome
:mam e
m am e
■.production
<production> [<son-flag-list>]
[ status
<status>
:syn-tests
<code>
:sem -tests
<oode>
:syn-actlons
<code>
:sem -actions
<oode>
:syn-recovery-action s <code>
:sem -recovery-action s <code>] )
gnam e is th e g ra m m a r n a m e w h e re th e ru le
m a m e is d efin e d . T h e s e tw o n a m e s m u s t b e p rovid ed in
e v ery ru le d e fin itio n s in c e in th e S IS w e ca n h a v e m ore th a n o n e g ra m m a r a v a ila b le w h ic h m u s t
be
referred to b y a n a m e . A g ra m m a r u s u a lly is d efin ed b y a defgram m d e c la r a tio n o f th e form:(defgramm gname [root] )
w h e r e
root is th e root c a te g o r y o f
g n a m e . T h is d e c la r a tio n s e t s u p all d a ta s t r u c t u r e s for th e
g ra m m a r b e in g d efin e d a n d m u s t b e is s u e d b efore a n y ru le d e fin itio n .
The Dictionary Format
A n y d ic tio n a r y o f a g r a m m a r c o n t a in s a s e t o f form s th a t a re a s s o c ia t e d w ith a s e t o f s y n ta c tic
a n d s e m a n t ic in fo r m a tio n . A form is w h a te v e r s e q u e n c e o f w o r d s w^ w 2 ... w n . W h en n= 1 w e h a v e a
sin g le form ,
o th e r w is e am u ltip le form
(n> 1). For a n y form , b e it s in g le or m u ltip le , th e first w ord w j is c a lle d th ek e y form .
T h e k e y form is th e m e a n for s to r in g a n d re tr ie v in g a ll in fo r m a tio n o f th e w h o le form in th e d a ta s t r u c t u r e s b u ilt b y th edefgram m
d e c la r a tio n . A n y form h a s a s s o c ia t e d th ree k in d s o f in fo r m a t io n , fo r m in g a nin t e r p r e ta tio n :
s y n ta c tic ca te g o r y ; s e m a n t ic v a lu e : a s e t o f fe a tu r e s. A form c a n h a v e m o re th a n o n e in te rp re ta tio n . In th is c a s e , a s e t o f in te r p r e ta tio n s m u s t b ed e fin e d s u p p ly in g a s th e first ite m th e k ey form ; a fte r w a r d s, for ev e ry s e q u e n c e o f w o r d s follo w in g
th e k e y form , t h e s e t o f in te r p r e ta t io n s . A n e n tr y o f th e d ic t io n a r y is d e fin e d a c c o r d in g to th e
2 T h is d e f in itio n le a v e s free th e u s e r o f d e fin in g r u le s w ith d is c o n t in u o u s c o n s t i t u e n t s in th e
s y n ta c t ic r e p r e s e n ta tio n . C u r r e n tly th e p a r se r d o e s n o t e m b e d a n y s t r a te g y for a fu ll t r e a tm e n t o f
t h e s e c a s e s s in c e th e c la s s ic a l d e fin itio n o f a d ja c e n c y is im p le m e n te d . T h is s t r u c t u r e w a s in itia lly
m o tiv a te d in o rd er to d e fin e C S r u le s b y o n ly o n e ru le, a n d n o t b y tw o (se e S e c t io n 4 .). F u r th er m o re,
s u c h a s t r u c t u r e a llo w s a f a s te r s e a r c h in th e p a r s in g s t r u c tu r e , p erfo r m e d b y th e m a tc h e r o f th e
p r o d u c tio n , w h e n , for in s t a n c e , far c o n s t it u e n t s m u s t b e id e n tifie d for lo n g -d is ta n c e t a s k s . A n yw ay,
s t a te d th e im p o r ta n t role t h a t c a n b e c o v e r e d b y th e r e p r e s e n ta tio n o f d is c o n t in u o u s c o n s t it u e n t s
(defentry keyform gnom e
(defform form
(aet-int :category
<caienory>
(.•semual<semcal>
featu res
<features>\ )+ )+ )
w h e re
keyform m u s t b e a str in g o f j u s t o n e w ord , e .g ., "dog", "train", etc.; th e form
m u s t b e e ith e r then u ll str in g "" for th e sin g le form
keyform , or a str in g o f o n e or m ore w o rd s. E very form d e fin itio n o f
th is k in d is s a id to b e in
d efe n try
form at.<category> Is th e s y n ta c tic c a te g o r y a n d
<sem val> is th e
s e m a n tic v a lu e . T h e fe a tu r e s m u s t b e provid ed in th e follo w in g form at:<features>
::= ({[< attribute s>) (<value>))+ )
<attributes>
::= a s e q u e n c e o f featu re a ttr ib u te s<ualue>
::= a v a lu e for th e fea tu re a ttr ib u te sAs a n ex a m p le:
( ((GENDER) (MASC))
((NUMBER) (SING))
((KIND-OF ARG1) (THING)) )
H ere are s o m e e x a m p le s o f d ic tio n a r y e n tr ie s. T h e m o s t trivial o f th e m is:
(defentry
"train" m y _ g ra m m a r(defform ""
(aet-int
:c a te g o r y Noun)))w h e r e th e sin g le form
train
is d efin e d b y o n e in te r p r e ta tio n o f ca te g o r y N o u n . A n e x a m p le o f a sin g le form w ith tw o in te r p r e ta tio n s is th e follow in g:(defentry
"tree" m y _ g ra m m a r(defform ””
(aet-int
: ca te g o r y N o u n.•features ( ((KIND-OF OBJ) (PLANT)) ))
(aet-int
: c a te g o r y N o u n:fe a tu r e s ( ((KIND-OF OBJ) (DATA-STRUCTURE)) ))))
w h ere
tree
is d efin ed a s a p la n t a n d a s a d a ta str u c tu r e . A n e x a m p le o f m u ltip le form Is:(defentry
"in" m y _ g ra m m a r(deffonn ""
(aet-int
: c a te g o r y Prep))(defform
"the"(aet-int
:c a te g o r y C om p P rep))) w h e r e in is d e fin e d a s a p re p o sitio n a n dln th e
a s a c o m p o u n d p rep o sitio n .The Feature Structures
In th e c u r r e n t s y s t e m w e h a v e a d o p te d a d a ta str u c tu r e th a t c a n b e a t th e s a m e tim e e ffic ie n t to
b e p r o c e s s e d , h o m o g e n e o u s a n d r e u s a b le in v a r io u s p la c e s o f th e s y s te m . T h is is w h y th e s a m e d a ta
s t r u c t u r e s a re p r o c e s s e d a t d iffer en t t im e s in d iffer en t p la c e s o f th e s y s t e m . For in s ta n c e , t h e le x ic a l
In fo r m a tio n lo o k e d -u p from t h e d ic t io n a r y Is s to r e d a t p a r s in g tim e ln th e t e r m in a l n o d e s o f th e
p a r s in g s t r u c t u r e th e p a r s e r b u ild s . T h u s , it is o b v io u s to giv e th e s a m e fo rm a t to th e d a ta in th e
d ic tio n a r y a n d in th e n o d e s o f th e p a r s in g s tr u c tu r e . F e a tu r e s t r u c t u r e s , in th e ir c la s s ic a l d e fin itio n
a s s e t s o f a ttr ib u te -v a lu e p a irs, a re a s s o c ia t e d w ith e a c h in te r p r e ta tio n o f a n y form in th e d ic tio n a r y
a n d o f a n y n o d e ln th e p a r s in g s t r u c t u r e . F S s a re tr e a te d a s t r e e s , a n d it is p o s s ib le to m a n a g e
c a lle d F e a t u r e S tr u c t u r e H a n d le r (FSH ). a llo w in g th e m a in o p e r a t io n s o n F S s a s c r e a tio n ,
m o d ific a tio n , d e le tio n . C u rren tly , th is p a ck a g e c o n ta in s 12 m ain o p e r a tio n s th a t can b e a p p lied o n
F S s . O ver t h is s e t o f lo w le v e l o p e r a U o n s o n F S s w e h a v e d e v e lo p e d a s e t o f g ra p h f u n c tio n s
a c c e s s ib le b y th e u ser , w h ich a c t on th e F S s a sso c ia te d w ith the n o d e s o f th e p a rsin g stru ctu r e.
Rules w ith Non-Operative Productions [NOP Rules)
W h en n o n -o p e r a tiv e p r o d u c tio n s are d efin ed in so m e ru le th e y d o n o t b u ild a n e w n o d e , b u t can
p erform v a r io u s a c t io n s , s u c h a s a c tiv a tin g o th e r r u le s, or a lte r in g s e m a n tic s t r u c t u r e s . T h ere are
th ree ty p e s o f n o n -o p e r a tiv e p r o d u c tio n s d e p e n d in g o n th e NOP ca te g o ry u s e d in th e le ft-h a n d side:
{ <NOP> I <NOP-ASE> I <NOP-SE> } <- w L ... w n
If <NOP> is u s e d th e n o n ly th e sy n ta c tic ru le is a p p lied a n d th e s e m a n tic ru le is n ev er co n sid e r e d .
O n ly th e s e m a n tic ru le c a n b e a p p lied a n d th e s y n ta c tic o n e is ign ored b y u s in g th e ca te g o r y
<NOP-S E >. F in a lly , b o th th e r u le s are a p p lied b y u s in g th e c a te g o ry <N O P-A<NOP-SE>. A s w e s h a ll s e e in <NOP-S ectio n
4. th is k in d o f p r o d u c tio n c a n b e u s e f u l in C S re co g n itio n , p ro v id in g a n a lte r n a tiv e w a y for d efin in g
C S r u le s . M oreover, NOP r u le s are a ls o u s e fu l w h e n it is n e c e s s a r y to co n tr o l th e a c tiv a tio n o f
real
r u le s , w ith th e o b je ctiv e o f lim itin g th e in d e te r m in ism o f th e p a rser.
3. Overview o f the SAIL Parsing S ystem
In th is s e c tio n w e d e sc r ib e b riefly th e p arser, th e d a ta s tr u c tu r e s it h a n d le s , a n d h o w it w ork s.
S ta r tin g from th e FSH co r e p a c k a g e , w e h a v e a d o p te d th is d a ta s t r u c tu r e w h e r e v e r p o s s ib le In sid e,
th e p a r sin g s y s t e m a s th e figure b elo w sh o w s . T h e p a rser b u ild s a p a r sin g s tr u c tu r e u n d e r th e form o f
a g ra p h , w h e r e e a c h n o d e c o n t a in s tw o k in d s o f in fo rm a tio n : a n in te r n a l s tr u c tu r e o f d a ta u s e d b y
th e p a r s in g a lg o r ith m o n ly , a n d th e lin g u is t ic ( s y n t a c t ic a n d s e m a n tic ) in fo r m a tio n s e t b y th e
g ra m m a r r u le s. B o th th e s e s t r u c t u r e s are r e p r e se n te d in a u n iq u e FS m a n a g e d b y th e p a rse r a n d th e
r u n n in g g r a m m a r b y u s in g th e u n d e r ly in g FSH f u n c tio n s . A n y s o u r c e g r a m m a r m u s t h a v e a s e t o f
r u le s a n d a s e t o f d ic tio n a r y fo rm s w ritten in th e fo rm a ts d e s c r ib e d p rev io u sly . G r a m m a r r u le s c a n
m a k e u s e o f tw o s e t s o f fu n c tio n s: th e g ra p h f u n c t io n s , w h ic h u s e th e F SH p a c k a g e to u p d a te th e
lin g u is t ic s t r u c t u r e s o f th e g r a p h , a n d th e p a r s e r m a n a g e m e n t f u n c t io n s to h a n d le th e v a r io u s
p a r sin g to o ls a n d m e c h a n is m s (see S e c tio n 4.).
T h e p a r s e r is a C F -b a s e d o n e , o r ig in a lly d eriv ed from th e ICA (Im m ed ia te C o n s tit u e n t A n a ly sis)
a lg o r ith m d e s c r ib e d in /G r is h m a n 1 9 7 6 / . It is a b o tt o m - u p s h if t- r e d u c e a c t io n - b a s e d a lg o r ith m ,
p e r fo r m in g le ft - to - r ig h t s c a n n in g a n d r e d u c tio n in a n im m e d ia te c o n s t it u e n t a n a ly s is . T h e d a ta
o f n o d e s th a t c a n b e te r m in a l or n o n -te r m in a l. T e rm in a l n o d e s are b u ilt in c o r r e s p o n d e n c e to a
s c a n n e d form , w h e r e a s n o n -te r m in a ls are b u ilt w h e n e v e r a ru le (oth er th a n a NOP rule) is a p p lied .
T h e p a r s in g s y s t e m w a s d e s ig n e d to v ie w th e g ra m m a r r u le s a s p r o c e s s e s to be e x e c u te d , a n d the
p a r se r a s th e p r o c e sso r . A t a n y m o m e n t, th e p a rser, follow in g a p riority sc h e m a , h a n d le s a q u e u e o f
p r o c e s s e s a w a itin g e x e c u tio n . In fa ct w e c a n h a v e d iffer en t ty p e s o f r u le s w ith d iffer en t p rio r ities o f
e x e c u t io n . S o it is p o s s ib le th a t a ru le, w h e n a p p lied , s e n d s a r e q u e s t for e x e c u tio n o f a n o th e r rule
in s e r tin g th e c a lle d ru le in th e a p p ro p ria te p o sitio n in th e q u e u e . A fter a s c a n n in g or a r e d u c tio n , the
p a rse r g e ts a s e t o f a ctiv e r u le s w h ic h are th e a p p lic a b le r u le s at th a t m o m e n t. W h en th e p a rse r ta k e s
s u c h a s e t - c a lle d a p a c k e t - for every ru le in th e p a c k e t3 it b u ild s a p r o c e s s d e s c r ip to r a n d in s e r ts
it in th e q u e u e . W e call s u c h a p r o c e s s d e sc r ip to r a n a p p lic a tio n sp e c ific a tio n (AS), w h ile th e q u e u e is
c a lle d th e a p p lic a tio n s p e c ific a tio n lis t (ASL). A S s a re c o m p o s e d o f a ll th e n e c e s s a r y in fo r m a tio n
u s e fu l to e x e c u te th e p r o c e s s o n th e p rop er c o n te x t. A S s in a g iv en ASL are o rd ered d e p e n d in g u p o n
th e ru le in v o lv ed in a n A S. In g en e r a l, if sta n d a r d a ctiv e r u le s h a v e to b e e x e c u te d . ASL is h a n d le d
w ith a LIFO p o lic y . T h e p a r s e r p e r fo r m s all p o s s ib le r e d u c t io n s b u ild in g m o re t h a n o n e n o d e if
n e c e s s a r y , e x tr a c tin g o n e A S a t a tim e b efore a n a ly z in g th e n e x t o n e . A fter a n A S is e x tr a c te d from
ASL th e p a r se r s e a r c h e s a m a tc h for th e r ig h t-h a n d s id e o n th e g rap h . T h e m a tc h in g , if s u c c e s s f u l,
r e tu r n s o n e o r m o re s e t s o f n o d e s , c a lle d r e d u c tio n s e t s . For ev e ry r e d u c tio n s e t. th e a p p lic a tio n o f
th e ru le is tried . In t h is w a y w e c a n c o n n e c t to g e th e r a ll p o s s ib le p a r s e s for a s e n t e n c e in a u n iq u e
str u c tu r e . T h e c o m p le te a lg o r ith m o f th e p a r se r is therefore:
Until the end o f the sentence is reached:
Scan a form:
build a n ew terminal node for the scan n ed form ;
For everu interpretation o f the node:
get the pack et o f rules corresponding to its category and fo r every rule tn
the p a ck et insert in ASL the AS;
For everu A S in ASL:
get the fir st AS from the top o f ASL;
get the rule specified in the AS, it is the current rule, and a ccess the node
specified in the AS. it is the current node:
starting from the current node perform the m atch on the graph using the
production o f the current rule:
if a t lea st one reduction se t is fo u n d then:
For everu reduction se t:
if the te s ts o f the current rule hold then:
execute the actions o f the current rule:
if a n ew non-termtncd node is built then:
get the pack et o f rules corresponding to its category and
fo r every rule in th e p a c k e t in se r t in A SL th e A S;
else:
ap p ly the recovery actions o f the current rule;
e lse :
app ly the . recovery actions o f the current rule;
4. Parsing Tools
Rule Disabling/ Enabling Operations
A s s ta te d p rev io u sly , r u le s ca n a s s u m e tw o d ifferen t s ta te s , a ctiv e or in a ctiv e . T he ru le's sta te is
d e te r m in e d at th e m o m e n t o f ru le d efin itio n . In a d d itio n , it is p o s s ib le to c h a n g e th e s t a te d u r in g ‘.he
p a r se b y u s in g tw o sp e c ific fu n c tio n s. In th e a p p lica tio n o f a ru le, o th e r s m a y b e c h a n g e d from active
to in a c tiv e , p erfo r m in g a d is a b lin g o p e r a tio n , or c h a n g e d from in a c tiv e to a c tiv e , p erfo r m in g a n
e n a b lin g o p e r a tio n . It is p o s s ib le to c h a n g e th e s ta te o f o n e or m ore r u le s at a tim e a n d th e r u le s can
a ls o p erform se lf-e n a b lin g a n d s e lf-d is a b lin g o p e r a tio n s. C h a n g e s o f s t a te effec te d d u r in g a p a r sin g
are n o t p e r m a n e n t. At th e e n d o f e a c h p a r s in g th e r u le s are r e c o n fig u r e d a s in d ic a te d in th e ir
o r ig in a l d e f in itio n .
Dictionary-Driven an d Rule-Driven A ctivation
T h e m e c h a n is m o f a c tiv a tio n o f r u le s c a n b e u s e d in o u r p a r sin g s y s t e m in ord er to im prove the
d e te r m in is m o f th e p a rse r. W e rem a rk th a t th e p a r sin g a lg o r ith m is b a s ic a lly a b o tto m -u p p arallel
n o n -d e te r m in istic
p a rser, s o th a t p a r titio n in g a g ra m m a r a s a s e t o f a ctiv e a n d In active r u le s, a n d d riv in g th e ir a p p lic a tio n b y a n a c tiv a tio n m e c h a n is m , w e c a n a c h ie v e a g re a t co n tr o l on th e p a rse rd ir e c t ly from th e g r a m m a r , w it h o u t e m b e d d in g s p e c ific c o n tr o l s t r a t e g ie s w ith in th e p a r s in g
a lg o r it h m .
A c tiv a tio n o f r u le s c a n b e effec te d d u r in g th e tw o m a in p h a s e s o f th e p a r se r activity: s c a n n in g a n d
r e d u c tio n . D ic tio n a r y -d r iv e n a c tiv a tio n c a n b e p erfo r m e d w h e n th e p a r s e r s c a n s a form d e fin e d
w ith a n in te r p r e ta tio n lik e th e follow in g:
(set-int
catego ry
<category>
isem val
<semval>
featu res
(((queue) (rule-name+))))
T h e s p e c ia l fe a tu r e
queue a d v is e s th e p a rse r o f a p refe ren ce for s p e c ific r u le s to a p p ly w h e n th e form
is s c a n n e d . T h is p refe ren ce is In d e p e n d e n t o f th e s t a te o f th e r u le s sp e c ifie d a n d th e A S s are q u e u e d
in ASL w ith o u t c o n s id e r in g th e p a c k e t co r r e sp o n d in g to ca teg o ry b ein g s c a n n e d . A s a c o n s e q u e n c e o f
t h is m e c h a n is m o f a c tiv a tio n , th e fifth a n d s ix t h lin e o f th e p a r s e r a lg o r ith m m u s t b e c h a n g e d a s
fo llo w s:
Q£± the p a ck e t o f rules corresponding to Us category an d fo r every rule in the p a ck e t insert
in ASL the A S u n less the interpretation requires rule activation by the special fe a tu re queue. In this
c a se insert in ASL the AS o f the rules supplied a s values o f the special fea tu re queue.
R u le -d r iv e n a c tiv a tio n , a t le v el o f r e d u c tio n ta sk , c a n b e a c c o m p lis h e d b y u s in g a d e v o te d fu n c tio n ,
c a lle d
r u le -a c tiy a tio n .
w h o s e a r g u m e n ts are th e n a m e s o f th e r u le s to a c tiv a te , a n d p r o v id e s for q u e u in g A S s in A SL for ev e ry n a m e sp e c ifie d . In b o th th e ty p e s o f a c tiv a tio n , th e a c tiv a te d r u le s area p p lie d J u st o n c e im m e d ia te ly a fte r th e s c a n n in g or th e te r m in a tio n o f th e a c tiv a tin g ru le. T h e s ta te
o f th e a c tiv a te d r u le is n o t m o d ifie d a n d a c tiv a tio n o f m o re th a n o n e ru le a t a tim e is p o s s ib le , a s
w e ll a s n e s t e d a c tiv a tio n s .
C ontext-Sensitive R ules
CS
r u le s w er e n o t d ir e c tly Im p lem en te d in o u r p a r sin g s y s te m , b u t th e y w ere a v a ila b le b y n a tu r e(in a d d itio n to th e w a y c u r r e n tly d e fin e d in S e c tio n 2.) t h a n k s to th e r u le -a c tiv a tio n m e c h a n is m a n d
N O P r u le s . T h e c o m p le te a p p lic a tio n for a C S p r o d u c tio n aAJ3<— cqffi is m a d e in tw o s t e p s . T h e first
o n e c o n c e r n s a c o n te x t d e te r m in a tio n , th e c o n te x t b e in g r e p r e s e n te d b y th e r ig h t-h a n d s id e o f th e C S
p r o d u c tio n . ( r $ . T h e s e c o n d o n e is j u s t a n a p p lic a tio n o f th e CF p r o d u c tio n A<- y, if a n d o n ly if th e
a c c o m p lis h e d b y d e fin in g a NOP ru le for th e c o n te x t d e te r m in a tio n a s first ste p . A fter w a rd s, th is
NOP ru le m u s t a ctivate th e C F ru le a s se c o n d step , b u ild in g th e n od e A in the proper co n tex t.
M essage Passing
T h e m e s s a g e p a s s in g m e c h a n is m is a p a r s in g to o l t h a t m a k e s p o s s ib le a s y n c h r o n o u s
o p e r a t io n s o n lin g u is tic d a ta . T h is w a y o f p r o c e s s in g im p lie s th e c o -o p e r a tio n b e tw e e n tw o r u le s
w h ic h in te r a c t w ith e a c h o th e r e x c h a n g in g so m e in fo r m a tio n b y m e a n s o f a s e n d in g a n d a receivin g
t a s k p erfo rm ed at th e tw o in d e p e n d e n t tim e s o f ru le a p p lic a tio n . T h e s e n d in g t a s k is p erform ed b y
th e s e n d in g ru le a t a Um e T^, s e n d in g a m e s s a g e for a n o th e r ru le. T h is la tte r ru le m u s t perform th e
re ce iv in g t a s k to re ce iv e th e m e s s a g e a t its e x e c u tio n tim e T2. (T2> T i). S in c e th e r e le v a n t lin g u istic
d a ta th e p a r s e r w o r k s o n a re s to r e d a s F S s , th e m e s s a g e s are F S s . W e h a v e im p le m e n te d two
a p p r o a c h e s o f m e s s a g e p a s s in g . T h e first o n e m a k e s u s e o f a g lo b a l F S w h e r e a n y ru le c a n sto r e
g lo b a l fe a tu r e s . A n y ru le d u r in g a p a r se c a n a c c e s s th is g lo b a l F S a n d w h a te v e r fe a tu r e v a lu e . T h is
typ e o f F S is th e g lo b a l c o u n te r p a r t o f th e F S sto r e d in ev e ry n o d e o f th e g ra p h str u c tu r e : th e F S o f a
n o d e is lo c a l a n d c a n o n ly b e a c c e s s e d b y th e n o d e s lin k e d to its n o d e b y a d ir e c t c o n n e c U o n lin k .
Therefore,- th e r e b e in g n o righ t o f p riv a cy o n fe a tu r e s in th e g lo b a l F S, th is p a rtic u la r s tr u c tu r e m u s t
b e a c c e s s e d w ith care b y th e r u le s s in c e it c a n b e a p la ce o f co n flic ts a m o n g th e m .
T h e s e c o n d a p p r o a c h p ro v id es a s tr u c tu r e th a t p r e s e r v e s th e rig h t o f p riv a cy o f th e m e s s a g e s . A lso in
t h is c a s e th e m e s s a g e s are F S s, a n d are sto re d in a so r t o f m a ilb o x , ca lled m e s s a g e -b o x . A n y ru le c a n
refer to th e m e s s a g e -b o x to sto r e a m e s s a g e , sp e c ify in g th e d e s tin a tio n ru le. O n th e o th e r s id e , a n y
ru le c a n refer to th e m e s s a g e -b o x to g e t m e s s a g e s , a n d o n ly th e m e s s a g e s a d d r e s s e d to it w ill b e
a v a ila b le . Let u s c o n s id e r th e tw o c a s e s s h o w n in th e fo llo w in g p a rtia l p a r s e -tr e e s .
W e s u p p o s e s o m e in fo r m a tio n , c r e a te d o r r a ise d in th e n o d e S N from th e ter m in a l s id e b y th e ru le Rs.
m u s t b e u s e d in th e n o d e RN b u ilt b y th e r u le Rr. (1) s h o w s t h a t th e m e s s a g e - b o x c o u ld b e u s e d
b y p a s s in g th e n o d e s N 1 .N 2 . T h is is u s e f u l w h e n (som e) d a ta from S N a re n o t r e le v a n t for p r o c e s s in g
in N I a n d N 2 . g a in in g t h e a d v a n t a g e t h a t n o m e m o r y s p a c e is w a s te d u s in g th e n o d e s N 1 .N 2 for
r a is in g th e d a ta from S N to RN. O n th e o th e r sid e , (2) s h o w s a c a s e w h e r e n o p a th e x i s t s b e tw e e n S N
a n d RN. T h er efo re , th e o n ly c o n n e c tio n b e tw e e n th e n o d e s c a n b e a c o m m o n s tr u c tu r e a c c e s s e d b y
th e m . T h e u s e o f t h e m e s s a g e - b o x is v e r y e a s y s in c e a ll th e w o r k is d o n e b y tw o f u n c t io n s . T h e
f u n c t io n
•e n d m * g
m a k e s a c o p y o f a s u b s e t o f th e F S s o f th e n o d e s it c a n a c c e s s (i.e ., th e n o d e s c o r r e s p o n d in g to th e left- a n d r ig h t-h a n d s id e o f th e p ro d u c tio n ) a n d s t o r e s it in th e m e s s a g e -b o x .T h e fu n c tio n re celv em sg g e ts a m e s s a g e u n d e r th e form o f F S a n d s to r e s it in th e n o d e co r r e sp o n d in g
to th e le ft-h a n d s id e o f t h e p r o d u c tio n .
T h e e x a m p le s h o w s a fra g m en t o f a g ra m m a r w h o s e aim is to d rive th e p a rse r a cc o r d in g to a
sp e c ific str a te g y o f re c o g n itio n a c h ie v in g a s r e su lt a n o p tim iz ed p a r sin g str u c tu r e , i.e.. the m in im u m
n u m b e r o f n o d e s str ic tly n e c e s s a r y is b u ilt.
T h e re c o g n itio n o f in d e fin ite ly lo n g c la u s e s o f th e form X
and X
and
... X co u ld be a c h ie v e d b y u sin g th e p r o d u c tio n s: AND <— NP "and NP. AND<—
AND *and NP. w h e re , for in s ta n c e . X c a n be a n NP an d•a n d is th e ca te g o r y o f and. T h e s e p r o d u c tio n s p ro d u ce a p a r sin g s tr u c tu r e o f th e k ind sh o w n below .
B e in g k th e n u m b e r o f c o n ju n c tio n s , th e n u m b e r o f th e n o d e s N(k) b u ilt b y t h e s e two p r o d u c tio n s is
given by: N(k) = TN(k) + NTN(k), TN(k) = 2 k + 1. NTN(k) = ( l/2 ) k ( k + 1).
A N D
5. An Example: SAILing X and X and ... X
NP "and NP *and NP *and NP *and NP
TN(k) d e t e r m in e s th e n u m b e r o f th e te r m in a l n o d e s , a n d NTN(k) th e n u m b e r o f th e n o n -te r m in a l
n o d e s . For th e g ra p h a b o v e N(4) = 19, s in c e TN(4) = 9 a n d NTN(4) = 10. T h is k in d o f p a r sin g s tr u c tu r e
is n o t o p tim iz e d , b e s id e s N(k) is a q u a d r a tic fu n c tio n o f k. In th e figu re a b o v e w e h a v e d r a w n in
b o ld fa c e lin e s th e p a r sin g s tr u c tu r e w ith th e m in im u m n u m b e r o f n o d e s w e w a n t. For th is o p tim iz ed
s t r u c t u r e NTN(k) is a lin e a r fu n c tio n o f k: NTN(k) = k. T h erefore, th e fo r m u la for th e o p tim iz e d c a s e
N0 (k) is: N0 (k) = 3 k + 1.
O u r g r a m m a r fr a g m e n t is b a s e d o n a w a tc h -r u le , c a lle d C h e c k -a n d -r u le , th a t c h e c k s w h e th e r th e
p a r se r h a s a lr e a d y b u ilt a n o d e o f c a te g o ry AND fo llow ed b y ’ a n d NP. T h is ru le h a s th e p ro d u ctio n :
<NOP> <- A ND 'a n d NP. a n d if its r ig h t-h a n d s id e h a s n o m a tc h it m e a n s t h a t th e first n o d e AND
h a s to b e b u ilt. C h e c k -a n d -r u le h a s th e fo llow in g d efin itio n .
(defrule
:g n a m e m y j r a m m a r
: m a m e C h e c k -a n d -r u le
:p r o d u c tio n (<NOP> (AND "and NP))
:sta tu s a c t iv e
: s y n - a c t lo n s
(ru le-activ atio n
'(M a k e -a n d -r u le NP)) : s y n - r e c o v e r y - a c t io n s(ru le-activ ation
'(M a k e -flr st-a n d -r u le NP))) T h e s y n - a c t i o n s a r e a p p lie d if th e r ig h t-h a n d s id e h a s a m a tc h a n d t h e r u le M a k e -a n d -r u le isa c tiv a te d to b u ild a n o n -te r m in a l n o d e AND. T h e s y n -r e c o v e r y -a c tio n s are a p p lie d w h e n th e p a r se r
h a s to b u ild for t h e first tim e a n o d e AN D. a n d th e ru le M a k e -fir s t-a n d -r u le is a c tiv a te d . T h e s e tw o
a c tiv a te d r u le s m u s t b e in a c tiv e s in c e th e w a tc h -r u le h a s th e w o rk o f a c tiv a tin g th e m .
(defrule
:g n a m e m y _ g r a m m a r
: m a m e M a k e - f lr s t- a n d -r u le
p r o d u c t io n (AND (NP *and NP))
(defrule
:gn am e m y .g r a m m a r
:m a m e M a k e -a n d -r u le
:p r o d u c tio n (AND (AND ’ a n d NP))
:sta tu s in a c tiv e )
6. Final Remark*
S o m e r e m a r k s a b o u t th e p a r sin g s y s t e m a n d th e p a r sin g to o ls d e s c r ib e d s o far are Ln order. A
first p o in t c o n c e r n s th e p riority a s s ig n e d to r u le s. It is cle a r th a t w e c a n h a v e th r ee m a in k in d s o f
ru les: a c tiv a te d r u le s , NOP r u le s a n d sta n d a r d r u le s. T h is Is a ls o th e ir d e c r e a s in g p riority ord er o f
e x e c u t io n : a c t iv a t e d r u le s h a v e th e h ig h e s t p rio r ity s in c e th e y a r e a n a tu r a l c o m p le t io n a n d
e x t e n s io n o f th e a c tiv a tin g rule: NOP r u le s c a n a fle c t s t r u c t u r e s u s e d b y sta n d a r d (non-NO P) r u le s in
th e ir p a c k e t, th e re fo r e th e y n e e d to b e p rop erly s c h e d u le d w ith a h ig h e r p riority th a n th e o th e r s .
F u r th e r m o r e , th is c la s s ific a tio n s h o w s h o w C G U s are n o t a m er e p la ce o f a d ecla r a tiv e d e sc r ip tio n o f
a g r a m m a r , b u t th e y are a ls o a p la c e w h e r e a p r o c e d u r a l d e s c r ip tio n o f a c t io n s c o n c e r n in g th e
p a r sin g p r o c e s s c a n b e g iv e n . T h is Is a p o w e rfu l w ay, w h e n c o n d itio n s are d e te c te d , o f a lte r in g th e
n a tu r a l b e h a v io u r o f th e p a r s e r t h a t fo llo w s a p a r a lle l b o tt o m - u p , n o n - d e t e r m in is t ic s tr a te g y .
A c tio n s ta k e n fo llo w th e d e te c tio n o f s o m e s itu a tio n in th e p a r sin g s tr u c tu r e s , e .g ., th e a c tiv a tio n o f
a r u le in s t e a d o f a n o th e r w h e n a m is s p e llin g Is fo u n d ln th e in p u t, a n d a p a r s in g p r o c e s s c a n b e
d riv e n b y a g r a m m a r w h e r e o n ly th e n e c e s s a r y r u le s for c o n te x t d e te c tio n a re s e t a c tiv e a n d t h o s e
d e v o te d to b u ild s t r u c t u r e s in a c tiv e . T h is w a y o f s e tt in g c o n tr o l o f th e p a r s e r p la c e s t h is p a r s in g
s y s t e m ln th e ca te g o r y o f s itu a tlo n -a c tio n p a r se r s (/W ln o g ra d 1 9 8 3 /) .
U n fo r tu n a te ly t h is p a p e r c a n n o t b e a p la c e for a w id e d e s c r ip tio n o f e x a m p le s o f g r a m m a r s u s in g th e
p a r s in g t o o ls o f SAIL. S o m e r u n n in g e x a m p le s , a s w e ll a s t h a t d e s c r ib e d a b o v e , c a n b e fo u n d ln
/M a r in o 1 9 8 8 a / . M oreover, s o m e Ill-form ed In p u t c a s e s h a v e b e e n fa ce d , e .g ., le x ic a l /s y n t a c t ic
111-f o r m e d n e s s , c o n s tr a in t v io la tio n , c o n s t it u e n t s h u 111-f111-flin g , m is s in g c o n s t it u e n t s , in /F e r r a r i 1 9 8 9 / . A
w id e r e p o r t o f t h e w o r k d e v e lo p e d in th e fr a m ew o rk o f th e E u r o p e a n ESPR IT P ro ject P 5 2 7 CFID
u s in g th e SAIL In te r fa c in g S y s te m is In /D e liv e r a b le 9 / w h e r e , a m o n g o th e r th in g s , th e d e s c r ip tio n
o f a n E n g lis h g ra m m a r a n d s e m a n t ic s Is s h o w n ( /M a c A ogain e t al. 1 9 8 9 /) .
T h e a u th o r Is th a n k fu l to G ia c o m o F errari w h o m a d e p o s s ib le th is w o r k
References
/A h o e t a l. 1 9 8 3 / A h o . A ., V ., H o p cr o ft, J . , E. a n d U llm a n , J . , D. 1 9 8 3 .
D ata S tr u c tu r e s and
A lgorith m s. A d d iso n -W e sle y , R ea d in g , M a ss.
/ B u n t a t al. 1 9 8 7 / B u n t, H .. T h e s in g h , J . a n d v a n d er S lo o t. K. 1 9 8 7 . D is c o n t in u o u s C o n s t it u e n t s In
T rees, R ules, a n d Parsing.
Proceedings o f the 3rd Conference o f th e European Chapter
o f th e ACL. C o p e n h a g en . D en m ark , pp. 2 0 3 - 2 1 0 .
/D e liv e r a b le 9 /
D eliverable 9: Im p lem e n ta tio n o f D ialogue S y stem .
1 9 8 9 . Ref. C F ID .D 9 .2 . ESPRIT P roject 5 2 7 (CFID)./F e r r a r i 1 9 8 9 / F errari. G. 1 9 8 9 .
The T rea tm en t o f Ill-Form ed In p u t w ith in th e Fram e o f SAIL.
W o rk in g P ap er. ESPRIT P ro ject 5 2 7 (CFID)./G r is h m a n 1 9 7 6 / G r is h m a n , R. 1 9 7 6 . A S u r v e y o f S y n t a c t ic A n a ly s is P r o c e d u r e s for N a tu r a l
L an gu age. A m erican Journal o f C om putational L inguistics. M icrofiche 4 7 , pp. 2 -9 6 .
/K n u th 1 9 7 3 / K nuth. D ., E. 1 9 7 3 .
The Art o f Computer Programming. V ol.m : Sorting and Searching.
A d d iso n -W e sle y , R ea d in g , M a ss./M a r in o 1 9 8 8 a / M arino. M. 1 9 8 8 . T he SAIL In terfa cin g S y ste m : A F ra m ew o rk for the D e v e lo p m e n t
o f N a tu r a l L a n g u a g e G ra m m a rs a n d A p p lic a tio n s.
T ech n ica l Report DL-NLP-88-1.
D e p a r tm e n t o f L in g u istic s . U n iv e r sity o f P isa./M a r in o 1 9 8 8 b / M a r in o . M. 1 9 8 8 . A P r o c e s s -A c t iv a t io n B a s e d P a r s in g A lg o r ith m for th e
D e v e lo p m e n t o f N a tu ra l L a n g u a g e G r a m m a rs.
P roceed in g s of 1 2 th In tern ation a l
Conference on Computational Linguistic*.
B u d a p e s t H ungary, pp. 3 9 0 -3 9 5 ./M a r in o 1 9 8 9 / M arin o, M. 1 9 8 9 . SAIL: A P rototyp e E n v ir o n m e n t for W ritin g NL A p p lic a tio n s. In
/D e liv e r a b le 9 / .
/W Lnograd 1 9 8 3 / W ln ograd , T. 1 9 8 3 .
Language as a C ogn itive P rocess. Vol.
1:S yn tax .
A d d lso n -W esley. R ead in g, M ass.