TRANSPORTABLE NATURAL-LANGUAGE INTERFACES TO DATABASES by
Gary G. Hendrlx and William H. Lewis SRI International
333 Ravenewood Avenue Menlo Park, California 94025
I INTRODUCTION
Over the last few years a number of
application systems have been constructed that a l l o w u s e r s t o a c c e s s d a t a b a s e s by p o s i n g q u e s t i o n s i n n a t u r a l l a n g u a g e s , such a s E n g l i s h . When used i n t h e r e s t r i c t e d domains f o r which t h e y have been e s p e c i a l l y d e s i g n e d , t h e s e s y s t e m s have a c h i e v e d r e a s o n a b l y h i g h l e v e l s of p e r f o r m a n c e . Such s y s t e m s a s LADDER [ 2 ] , PLANES [ 1 0 ] , ROBOT [ 1 ] , and REL [9] r e q u i r e t h e e n c o d i n g of knowledge a b o u t t h e domain o f a p p l i c a t i o n in s u c h c o n s t r u c t s as d a t a b a s e s c h e m a t a , l e x l c o n s , p r a g n m t i c grammars, and t h e l l k e . The c r e a t i o n of t h e s e d a t a s t r u c t u r e s t y p i c a l l y r e q u i r e s c o n s i d e r a b l e e f f o r t on t h e p a r t o f a c o m p u t e r p r o f e s s i o n a l who h a s had special training in computational l i n g u i s t i c s and
the use of databases. Thus, the utility of these
systems is severely limited by the high cost
involved in developing an interface to any
particular database.
T h i s p a p e r d e s c r i b e s i n i t i a l work on a m e t h o d o l o g y f o r c r e a t i n g n a t u r a l - l a n g u a g e p r o c e s s i n g c a p a b i l i t i e s f o r new domains w i t h o u t t h e need f o r i n t e r v e n t i o n by s p e c i a l l y t r a i n e d e x p e r t s . Our a p p r o a c h i s t o a c q u i r e l o g i c a l s c h e m a t a and l e x i c a l i n f o r m a t i o n t h r o u g h s i m p l e i n t e r a c t i v e d i a l o g u e s w i t h someone who i s f a m i l i a r w i t h t h e form and c o n t e n t of t h e d a t a b a s e , b u t u n f a m i l i a r w i t h t h e t e c h n o l o g y of n a t u r a l - l a n g u a g e i n t e r f a c e s . To t e s t o u r a p p r o a c h i n an a c t u a l c o m p u t e r e n v i r o n m e n t , we have d e v e l o p e d a p r o t o t y p e s y s t e m c a l l e d TED ( T r a n s p o r t a b l e E n g l i s h D a t a m a n a g e r ) . As a r e s u l t o f o u r e x p e r i e n c e w i t h TED. t h e NL group a t SRI i s now u n d e r t a k i n g t h e d e v e l o p = a n t o f a ~ c h more a m b i t i o u s s y s t e m b a s e d on t h e s a n e p h i l o s o p h y
[ 4 ] .
I I RESEARCH PROBLEMS
Given t h e d e m o n s t r a t e d f e a s i b i l i t y o f l a n g u a g e - a c c e s s s y s t e m s , s u c h as LADDER, m a j o r r e s e a r c h i s s u e s t o be d e a l t w i t h i n a c h i e v i n g t r a n s p o r t a b l e d a t a b a s e i n t e r f a c e s i n c l u d e t h e f o l l o w i n g :
* I n f o r m a t i o n used by t r a n s p o r t a b l e s y s t e m s must be c l e a n l y d i v i d e d i n t o d a t a b a s e - i n d e p e n d e n t and d a t a b a s e - d e p e n d e n t p o r t i o n s .
* Knowledge r e p r e s e n t a t i o n s must be e s t a b l i s h e d f o r t h e d a t a b a s e - d e p e n d e n t p a r t i n s u c h a way t h a t t h e i r form i s f i x e d and a p p l i c a b l e t o a l l d a t a b a s e s and t h e i r c o n t e n t r e a d i l y a c q u i r a b l e .
* Mechanisms must be d e v e l o p e d t o e n a b l e t h e s y s t e m t o a c q u i r e information a b o u t a p a r t i c u l a r a p p l i c a t i o n f r o m n o n l i n g u i s t s .
I I I THE TED PROTOTYPE
We have d e v e l o p e d o u r p r o t o t y p e s y s t e m (TED) t o e x p l o r e one p o s s i b l e a p p r o a c h to c h a s e p r o b l e m s . In e s s e n c e , TED i s a LADDER-like n a t u r a l - l a n g u a g e p r o c e s s i n g s y s t e m f o r a c c e s s i n g d a t a b a s e s , combined w i t h an " a u t o m a t e d i n t e r f a c e e x p e r t " t h a t
interviews users to learn the language and logical
s t r u c t u r e a s s o c i a t e d w i t h a p a r t i c u l a r database and t h a t a u t o m a t i c a l l y t a i l o r s t h e s y s t e m f o r u s e w i t h t h e p a r t i c u l a r a p p l i c a t i o n . TED a l l o w s u s e r s t o c r e a t e , p o p u l a t e , and e d i t ~ h e i r own new l o c a l d a t a b a s e s , t o d e s c r i b e e x i s t i n g l o c a l d a t a b a s e s , o r e v e n t o d e s c r i b e and s u b s e q u e n t l y a c c e s s h e t e r o g e n e o u s ( a s i n [ 5 ] ) d i s t r i b u t e d d a t a b a s e s .
Most of TED i s b a s e d on and b u i l t from components of LADDER. In p a r t i c u l a r , TED u s e s t h e LIFER p a r s e r and i t s a s s o c i a t e d s u p p o r t p a c k a g e s [ 3 ] , t h e SODA d a t a a c c e s s p l a n n e r [ 5 ] , and t h e FAM f i l e a c c e s s manager [ 6 ] . A l l of t h e s e s u p p o r t p a c k a g e s a r e i n d e p e n d e n t o f t h e p a r t i c u l a r d a t a b a s e u s e d . In LADDER, t h e d a t a s t r u c t u r e s u s e d by t h e s e components ~ r e h a n d - g e n e r a t e d f o r s p a r t i c u l a r d a t a b a s e by c o m p u t e r s c i e n t i s t s . I n TED, however, t h e y a r e c r e a t e d by TED's a u t o m a t e d i n t e r f a c e e x p e r t .
L i k e LADDER, TED u s e s a p r a g m a t i c g r a n m a r ; b u t TED's p r a g m a t i c gramemr does n o t make any a s s t m p t l o n s a b o u t t h e p a r t i c u l a r d a t a b a s e being a c c e s s e d . I t assumes o n l y t h a t i n t e r a c t i o n s w i t h t h e s y s t e m w i l l c o n c e r n d a t a a c c e s s o r u p d a t e , and t h a t i n f o r m a t i o n r e g a r d i n g t h e p a r t i c u l a r d a t a b a s e w i l l be encoded i n d a t a s t r u c t u r e s o f a p r e s c r i b e d form, which a r e c r e a t e d by t h e a u t o m a t e d i n t e r f a c e e x p e r t .
The e x e c u t i v e l e v e l of TED a c c e p t s t h r e e k i n d s o f i n p u t : q u e s t i o n s s t a t e d i n E n g l i s h a b o u t t h e d a t a i n f i l e s t h a t h a v e been p r e v i o u s l y d e s c r i b e d t o t h e s y s t e m ; q u e s t i o n s p o s e d i n t h e SODA q u e r y l a n g u a g e ; s i n g l e - ~ o r d commands that ~nltlaCe d i a l o g u e s w i t h t h e a u t o m a t e d i n t e r f a c e e x p e r t .
zv THE * . T a ~ A ~ I ~ r ~ F A C ~ ) X ~ R T
A. P h i l o s o p h 7
* There m u s t be a r a n g e o f r e a d i l y u n d e r s t o o d q u e s t i o n s t h a t e l i c i t a l l t h e i n f o r m a t i o n n e e d e d a b o u t a n e w d a t a b a s e .
* The q u e s t i o n s m u s t b e b o t h b r i e f a n d e a s y to u n d e r s t a n d .
* The s y s t e m m u s t a p p e a r c o h e r e n t , ellciting r e q u i r e d information in an o r d e r c o m f o r t a b l e t o t h e u s e r .
* The s y s t e m m u s t p r o v i d e s u b s t a n t i a l a s s i s t a n c e , w h e n n e e d e d , t o e n a b l e a u s e r t o u n d e r s t a n d t h e k i n d s o f r e s p o n s e s t h a t a r e e x p e c t e d .
A l l t h e s e p o i n t s c a n n o t b e c o v e r e d h e r e i n , b u t t h e s a m p l e t r a n s c r i p t s h o w n a t t h e e n d o f t h i s p a p e r t
in conjunction with the following discussion,
s u g g e s t s t h e m a n n e r o f o u r a p p r o a c h .
B. S t r a t e g y
A k e y s t r a t e S y o f TED i s t o f i r s t a c q u i r e
information a b o u t the structure of files. Because
t h e semantics of files is relatively well
understoodt t h e system thereby lays the foundation
for subsequently a c q u i r i n g information a b o u t t h e
linguistic constructions likely t o be used i n
questions about the data contained in the file. One o f t h e s i n g l e - w o r d c o - - - - n d s a c c e p t e d b y t h e TED e x e c u t i v e s y s t e m i s t h e command NEW, w h i c h i n i t i a t e s a d i a l o g u e p r o m p t i n g t h e u s e r t o s u p p l y i n f o r m a t i o n a b o u t t h e s t r u c t u r e o f a new d a t a f i l e . T h e NEW d i a l o g u e a l l o w s t h e u s e r t o t h i n k o f t h e f i l e a s a t a b l e o f i n f o r m a t i o n a n d a s k s r e l a t i v e l y s i m p l e q u e s t i o n s a b o u t e a c h o f t h e f i e l d s ( c o l u m n s ) in t h e file ( t a b l e ) .
For example, TED asks for the heading names of
the columns, for possible synonyms for t h e heading
n a m e s , a n d f o r i n f o r m a t i o n a b o u t t h e t y p e s o f v a l u e s ( n u m e r i c , B o o l e a n , o r s y m b o l i c ) t h a t e a c h c o l u m n c a n c o n t a i n . T h e h e a d i n g n a m e s g e n e r a l l y a c t l i k e r e l a t i o n a l n o u n s , w h i l e t h e i n f o r m a t i o n a b o u t t h e t y p e o f v a l u e s i n e a c h c o l u m n p r o v i d e s a
clue to the column's semantics. The heading name
of a symbolic column tends to he the generic name f o r t h e c l a s s o f o b j e c t s r e f e r r e d t o b y t h e v a l u e s o f t h a t c o l u m n . H e a d i n g n a m e s f o r B o o l e a n c o l u m n s t e n d c o b e t h e n a m e s o f p r o p e r t i e s t h a t d a t a b a s e o b j e c t s can possess. T.f a c o l u m n contains numbers,
thls suggests that there may be some scale wlth
a s s o c i a t e d a d j e c t i v e s of d e g r e e . To a l l o w t h e s y s t e m t o a n s w e r q u e s t i o n s r e q u i r i n g t h e i n t e g r a t i o n of information from m u l t i p l e files, t h e u s e r i s a l s o a s k e d a b o u t t h e i n t e r c o n n e c t i o n s b e t w e e n t h e f i l e c u r r e n t l y b e i n g d e f i n e d a n d o t h e r f i l e s d e s c r i b e d p r e v i o u s l y .
C. E x a m p l e s f r o m a T r a n s c r i p t
I n t h e s a m p l e t r a n s c r i p t a t t h e e n d o f t h i s p a p e r , t h e u s e r i n i t i a t e s a NEW d i a l o g u e a t P o i n t A . The a u t o m a t e d i n t e r f a c e e x p e r t t h e n t a k e s t h e i n i t i a t i v e i n t h e c o n v e r s a t i o n , a s k i n g f i r s t f o r t h e name o f t h e new f i l e , t h e n f o r t h e n a m e s o f t h e
file's fields. The file name wlll be u s e d t o
dlstlngulsh the new file from others during the
acquisition process. The field names are entered
into the lexicon as the names of attributes and are
put on an agenda so that further questions about
the fields may be asked subsequently of the user.
At this point, TED still does not know what
type of objects the data in the new file concern.
Thus, as its next task, TED asks for words that
might be used as generic names for the subjects of
the file. Then, at Point E, TED acquires
Information about how to identify one of these s u b j e c t s c o t h e u s e r a n d , a t P o i n t F , d e t e r m i n e s w h a t k i n d s o f p r o n o u n s m i g h t be u s e d t o r e f e r t o o n e o f t h e s u b j e c t s . (As r e g a r d s s h i p s , TED i s f o o l e d , b e c a u s e s h i p s may b e r e f e r r e d t o b y " s h e . " )
TED i s p r o g r a - , ~ e d wlch the knowledge t h a t the identifier of an object must be some kind of name, r a t h e r t h a n a numeric q u a n t i t y o r B o o l e a n v a l u e .
Thus, TED can assume a priori that the NAME field
given in Interaction E is symbolic in nature. At
P o i n t G, TED acquires p o s s i b l e s y n o n y m s f o r NAME. TED t h e n c y c l e s t h r o u g h a l l t h e o t h e r f i e l d s , a c q u i r i n g i n f o r m a t i o n a b o u t t h e i r i n d i v i d u a l s e m a n t i c s . At P o i n t H, TED a s k s a b o u t t h e CLASS f i e l d , b u t t h e u s e r d o e s n ' t u n d e r s t a n d t h e q u e s t i o n . By t y p i n g a q u e s t i o n e u ' r k , t h e u s e r c a u s e s TED t o g i v e a m o r e d e t a i l e d e x p l a n a t i o n o f w h a t i t n e e d s . E v e r y q u e s t i o n TED a s k s h a s a t l e a s t t w o l e v e l s o f e x p l a n a t i o n t h a t a u s e r may c a l l u p o n f o r c l a r i f i c a t i o n . F o r e x a m p l e , t h e u s e r a g a i n h a s t r o u b l e a t J , w h e r e u p o n h e r e c e i v e s a n e x t e n d e d e x p l a n a t i o n w i t h a n e x a m p l e . S e e T a l s o .
D e p e n d i n g u p o n w h e t h e r a f i e l d i s s y m b o l i c , a r i t h n e t i c o r B o o l e a n , TED m a k e s d i f f e r e n t f o r m s o f e n t r i e s i n i t s l e x i c o n a n d s e e k s t o a c q u i r e d i f f e r e n t t y p e s o f i n f o r m a t i o n a b o u t t h e f i e l d . F o r e x a m p l e , a s a t P o i n t s J , K a n d ¥ , TED a s k s w h e t h e r symbolic field values can be used as m o d i f i e r s (usually i n n o u n - ~ o u n c o m b i n a t i o n s ) . F o r
a r i t h m e t i c f i e l d s , TED l o o k s f o r a d j e c t i v e s a s s o c i a t e d w i t h s c a l e s , a s i s i l l u s t r a t e d b y t h e s e q u e n c e 0PQR. O n c e TED h a s a w o r d s u c h a s OLD, i t a s s u m e s MORE OLD, OLDER a n d OLDEST may a l s o b e u s e d . (GOOD-BETTER-BEST r e q u i r e s special i n t e r v e n t i o n . )
N o t e t h e a g g r e s s i v e u s e of p r e v i o u s l y a c q u i r e d information i n formulating new q u e s t i o n s t o t h e user (as in the use of AGE, and SHIP at Point P).
We have found that this aids considerably in
k e e p i n g t h e u s e r f o c u s e d o n t h e c u r r e n t i t e m s o f i n t e r e s t c o t h e s y s t e m a n d h e l p s t o k e e p i n t e r a c t i o n s b r i e f .
O n c e TED h a s a c q u i r e d local i n f o r m a t i o n a b o u t a new f i l e , i t s e e k s t o r e l a t e i t t o a l l known files, including t h e new file itself. A t P o i n t s Z t h r o u g h B+, TED d i s c o v e r s c h a t the * S H I P * file may
be Joined w i t h itself. That is, one of the
attrlbutes of a ship is yet another ship (the
escorted shlp)j which may itself be described in
the same file. The need for this information is
i l l u s t r a t e d b y t h e q u e r y t h e u s e r p o s e s a t P o i n t G+.
TO b e t t e r i l l u s t r a t e l i n k a g e s b e t w e e n f i l e s , t h e t r a n s c r i p t i n c l u d e s t h e a c q u i s i t i o n o f a s e c o n d
file about ship classes, beginnlng at Point J + .
Much of thls dialogue is omitted b u t , aC L÷s TED
l e a r n s t h e r e i s a l i n k b e t w e e n t h e * S H I P * a n d *CLASS* files. At /4+ it l e a r n s t h e d i r e c t i o n of
t h i s l i n k ; a t N+ and O+ i t l e a r n s t h e f i e l d s upon w h i c h t h e J o i n must be made; a t P+ it l e a r n s t h e a t t r i b u t e s i n h e r i t e d t h r o u g h t h e llnk. This i n f o r m a t i o n I s u s e d , f o r e x a m p l e , I n a n s w e r i n g t h e q u e r y a t S+. TED c o n v e r t s t h e u s e r ' s q u e s t i o n "What I s t h e s p e e d of t h e h o e l ? " i n t o ' ~ h a t i s t h e s p e e d of t h e c l a s s whose C N ~ i s e q u a l t o t h e CLASS of t h e h o e l ? . "
Of c o u r s e , t h e whole p u r p o s e o f t h e NEW d i a l o g u e s i s t o make i t p o s s i b l e f o r u s e r s t o a s k q u e s t i o n s of t h e i r d a t a b a s e s i n E n g l i s h . Examples o f E n g l i s h i n p u t s a c c e p t e d by TED a r e shown a t P o i n t s E+ t h r o u g h I + , and S+ and T+ I n t h e t r a n s c r i p t . Note t h e u s e of noun-noun c o m b i n a t i o n s , s u p e r l a t i v e s and a r i t h m e t i c . A l t h o u g h n o t i l l u s t r a t e d , TED a l s o s u p p o r t s a l l t h e
available LADDER facilities of ellipsis, spelling
c o r r e c t i o n , r u n - t i m e g r a m , ~ r e x t e n s i o n end i n t r o s p e c t i o n .
V THE PRACHATIC GRAMMAR
The p r a g m a t i c grammar u s e d by TED i n c l u d e s s p e c i a l s y n t a c t i c / s e m a n t i c c a t e g o r i e s t h a t a r e a c q u i r e d by t h e NEW d i a l o g u e s . In o u r a c t u a l i m p l e m e n t a t i o n , t h e s e h a v e r a t h e r awkward names, b u t t h e y c o r r e s p o n d a p p r o x / m a c e l y t o t h e f o l l o w i n g : * <GENERIC> i s t h e c a t e g o r y f o r t h e g e n e r i c names of t h e o b j e c t s i n f i l e s . L e x l c a l p r o p e r t i e s f o r t h i s c a t e g o r y i n c l u d e t h e name of t h e r e l e v a n t f i l e ( s ) and t h e names of t h e f i e l d s t h a t c a n be u s e d Co i d e n t i f y one of t h e o b j e c t s t o t h e u s e r . See t r a n s c r i p t P o i n t s D and E.
* <ID.VALUE> is the category for the
i d e n t i f i e r s of s u b j e c t s o f i n d i v i d u a l r e c o r d s ( i . e . , k e y - f i e l d v a l u e s ) . For e x a m p l e , f o r t h e *SHIP* f i l e , i t c o n t a i n s t h e v a l u e s of t h e NAME f i e l d . See t r a n s c r i p t P o i n t E.
* <MOD.VALUE> i s the category for the v a l u e s
of database fields that can serve as
m o d i f i e r s . See P o i n t s J and K.
* <NUM.ATTP.>, <SYM.ATTR>, and <BOOL.ATTP.> a r e n , - - e r i c , s y m b o l i c and B o o l e a n a t t r i b u t e s , r e s p e c t i v e l y . They i n c l u d e t h e names of a l l d a t a b a s e f i e l d s and t h e i r synonyms. * <+NUM.ADJ> i s t h e c a t e g o r y f o r a d j e c t i v e s
( e . g . OLD) a s s o c i a t e d with n u m e r i c f i e l d s . L e x l c a l p r o p e r t i e s i n c l u d e t h e name of t h e a s s o c i a t e d f i e l d and f l i e s , a s v e i l a s i n f o r m a t i o n r e g a r d i n g w h e t h e r t h e a d j e c t i v e i s a s s o c i a t e d w i t h g r e a t e r ( a s I n OLD) o r l e s s e r ( a s i n YOUNG) v a l u e s i n t h e f i e l d . See P o i n t s P, Q and R.
* <COMP.ADJ> and <SUPERLATIVE> a r e d e r i v e d f r o = <+NUM.ADJ>.
Shown below a r e some i l l u s t r a t i v e p r a g m a t i c p r o d u c t i o n r u l e s f o r n o n l e x l c a l c a t e g o r i e s . As i n t h e f o r e g o i n g e x a m p l e s , t h e s e a r e n o t e x a c t l y t h e r u l e s u s e d by TED, b u t t h e y do c o n v e y t h e unCure of t h e a p p r o a c h .
<S> -> <PRESENT> THE <ATTP.> OF <ITEM> what is the age of the reeves HOW <+NUM.ADJ> <BE> <ITEM>
how o l d i s the y o u n g e s t s h i p <WHDET> <ITEM> <HAVE> <FEATURE>
what l e a h y s h i p s have a d o c t o r <WHDET> <ITEM> <BE> <COMPLEMENT>
which s h i p s a r e o l d e r t h e n r e e v e s <PRESENT> -> WHAT <BE>
PRINT <ATrR> -> <NUM.ATTR>
<SYM.ATTR> <BOOL.ATTK> <ITEM> -> <GENERIC>
s h i p s <ID.VALUE>
r e e v e s THE <ITEM>
the oldest shlp <MOD.VALUE> <ITEM>
leahy ships <SUPERLATIVE> <ITEM>
f a s t e s t s h i p w i t h • d o c t o r <ITEM> <WITH> <FEATURE>
s h i p with a speed greater than 12 <FEATURE> -> <BOOL.ATTR>
d o c t o r / p o i s o n o u s
<NUN.ATTE> <NUM.COMP> <NUMBER>
age of 15
<NUM.ATTR.> <NUM.COMP> <ITEM> a g e g r e a t e r t h a n r e e v e s
<NUM.COMP> -> <COMP.ADJ> THAN OF
(GREATER> THAN
<COMPLEMENT> -> <COMP.A/kJ> THAN <ITEM> <COMP.ADJ> THAN <NUMBER>
These p r a g m a t i c E r a - m a r r u l e s a r e v e r y much l i k e t h e ones u s e d in LADDER [ 2 ] , b u t t h e y d i f f e r from t h o s e of LADDER i n two c r i t i c a l w a y s .
(1) They c a p t u r e t h e p r a g m a t i c s o f a c c e s s i n g d a t a b a s e s w i t h o u t f o r c i b l y £ncludin8 i n f o r m a t i o n a b o u t t h e p r a S m a t i c s of any one p a r t i c u l a r s e t of d a t a .
( 2 ) They use s ~ t s c t 4 ~ / s e m a n t i c c a t e g o r i e s t h a t s u p p o r t t h e p r o c e s s e s o f accessln8 d a t a b a s e s , b u t t h a t a r e d o m s i n - i n d e p e n d e n t and e a s i l y a c q u i r a b l e . I t i s w o r t h n o t i n g t h a t , e v e n when a p s r C l c u l a r a p p l i c a t i o n r e q u i r e s t h e i n t r o d u c t i o n o f S p e c i a l - p u r p o s e r u l e s , t h e b a s i c p r a g m a t l c grmamar u s e d by TED p r o v i d e s a s t a r t i n g p o i n t from w h l c h d o m a i n - s p e c i f i c f e a t u r e s c a n be a d d e d .
VI DIRECTIONS FOR FURTHER WORK
t o p r o v i d e adequate s y n t a c t i c and c o n c e p t u a l
coverage, as well as t o increase the ease with
which systems may be adapted to new databases. A severe limitation of the current TED system i s i t s r e s t r i c t e d r a n g e o f s y n t a c t i c c o v e r a g e . For e x a m p l e , TED d e a l s o n l y w i t h t h e v e r b s BE and HAVE, and d o e s n o t know a b o u t u n i t s ( e . g . , t h e W a d d e l ' s a g e i s 1 5 . 5 , n o t 15.5 YEARS). To remove t h i s l i m i t a t i o n , t h e SRI NL g r o u p i s c u r r e n t l y a d a p t i n g J a n e R o b i n s o n ' s e x t e n s i v e DIAGRAM grammar {7] f o r u s e i n a s u c c e s s o r Co TED. I n p r e p a r a t i o n f o r t h e l a t t e r , we a r e e x p e r i m e n t i n g w i t h v e r b a c q u i s i t i o n dialogues such as the following:
> VERB
P l e a s e c o n j u g a t e t h e v e r b
(e.g. fly flew flown) > EARN EARNED EARNED EARN is:
1 i n t r a n s i t i v e (John d i n e s ) 2 t r a n s i t i v e ( J o h n e a t s d i n n e r )
3 d i c r a n s i t i v e ( J o h n c o o k s Mary d i n n e r ) (Choose t h e most g e n e r a l p a t t e r n ) > 2
who or what is EARNED? > A SALARY
w h o or what EARNS A SALARY? > AN EMPLOYEE can A SALARY be EARNED by AN EMPLOYEE? > YES c a n A SALARY EARN? > NO
can AN ~dPLOYEE EARN? > NO
Ok:, an EMPLOYEE can EARN a SALARY
What database field identifies an EMPLOYEE? > NAME
What database field identifies a SALARY? > SALARY
extensive conceptual and symtacclc coverage
continues to pose a challenge to research, a
polished version of t h e TED p r o t o t y p e , e v e n with i t s limited coverage, would appear to have high potential as a useful tool for data access.
KEFER£NCES
1. L . R . H a r r i s , " U s e r O r i e n t e d D a t a Base Query w i t h t h e ROBOT N a t u r a l Language Query S y s t e m , " P r o c . T h i r d I n t e r n a t i o n a l C o n f e r e n c e o.~n Vet [ L a r g e Data B a s e s ; Tokyo ( O c t o b e r 1 9 7 7 ) . 2. G . G . H e n d r i x , E. D. S e c e r d o t i , D. S a g a l o w i c z ,
a n d J . Slocum, " D e v e l o p i n g a N a t u r a l Language I n t e r f a c e t o Complex D a t a , " ACH T r a n s a c t i o n s on Database Systems , Vol. 3,--~. 2 (June
1978).
3. G . G . H e n d r i x , "Human E n g i n e e r i n g f o r A p p l i e d
N a t u r a l Language P r o c e s s i n g , " P r o c . 5 t h I n t e r n a t i o n a l J o i n t C o n f e r e n c e on A r t i f i c i a l
4.
5.
The g r e a t e s t c h a l l e n g e t o e x t e n d i n g s y s t e m s l i k e TED i s t o i n c r e a s e t h e i r c o n c e p t u a l c o v e r a g e . As p o i n t e d o u t by T e n n a n t [ 8 ] , umers who a r e
accorded n a t u r a l - l a n g u a g e access co a database 6.
e x p e c t n o t o n l y t o r e t r i e v e i n f o r m a t i o n d i r e c t l y s t o r e d t h e r e , b u t a l s o co compute " r e a s o n a b l e " d e r i v a t i v e i n f o r m a t i o n . For e x a m p l e , i f a d a t a b a s e
has the location of two ships, users will expect
t h e s y s t e m t o be a b l e t o p r o v i d e t h e d i s t a n c e b e t w e e n t h e m - - a n i t e m of i n f o r m a t i o n n o t d i r e c t l y 7. r e c o r d e d i n t h e d a t a b a s e , b u t e a s i l y computed from
the existing data. In general, any system that is
tO be w i d e l y accepted by u s e r s must n o t o n l y
provide access t o primary information, but uast
a l s o e n h a n c e t h e l a t t e r w i t h p r o c e d u r e s t h a t 8. c a l c u l a t e s e c o n d a r y a t t r i b u t e s from t h e d a t a a c t u a l l y s t o r e d . D a t a e n h a n c e m e n t p r o c e d u r e s a r e c u r r e n t l y p r o v i d e d by LADDER and a few o t h e r h a n d - b u i l t s y s t e m s , b u t work i s n e e d e d now t o d e v i s e means f o r a l l o w i n g s y s t e m u s e r s t o s p e c i f y t h e i r
own d a t a b a s e enhancement functions and to c o u p l e 9. t h e s e wlth the natural-language component.
A s e c o n d i s s u e a s s o c i a t e d w i t h c o n c e p t u a l
c o v e r a g e i s t h e a b i l i t y t o a c c e s s i n f o r m a t i o n e x t r i n s i c t o t h e d a t a b a s e p e r s e , s u c h a s where t h e d a t a a r e s t o r e d and how t h e f i e l d s a r e d e f i n e d , a s 10. well as information a b o u t t h e s t a t u s of t h e q u e r y system itself.
In summary, s y s t e m s s u c h a s LADDER a r e of
l i m i t e d u t i l i t y u n l e s s t h e y c a n be t r a n s p o r t e d t o new d a t a b a s e s by p e o p l e w i t h no s i g n i f i c a n t f o r m a l t r a i n i n g in c o m p u t e r s c i e n c e . A l t h o u g h t h e d e v e l o p m e n t of u s e r - s p e c i f i a b l e s y s t e m s w i t h
I n t e l l i g e n c e , Cambridge, Massachusetts (August
1977).
G. G. N e n d r i x , D. S a g a l o w l c z a n d E. D.
S a c e r d o t i , " R e s e a r c h on T r a n s p o r t a b l e E n g l i s h - A c c e s s H e d i a t o D i s t r i b u t e d and L o c a l Data B a s e s , " P r o p o s a l ECU 79-I03, A r t i f i c i a l
I n t e l l i g e n c e C e n t e r , SRI I n t e r n a t i o n a l , Menlo
P a r k , C a l i f o r n i a (November 1 9 7 9 ) .
R. C. Moore, " K a n d l i n g Complex Q u e r i e s i n a D i s t r i b u t e d D a t a E a s e , " T e c h n i c a l Note 170, A r t i f i c i a l I n t e l l i g e n c e C e n t e r , SRI
I n t e r n a t i o n a l Menlo P a r k , C a l i f o r n i a ( O c t o b e r 1979).
P. M o r r i s and V. S a g a l o w i c z , ' ~ l a n a g i n g Network A c c e s s t o a D i s t r i b u t e d D a t a B a s e , " P r o c . Second S e r k e l e ~ Workshop on D i s t r i b u t e d Data Hana6e~enc and Computer N e t w o r k s , g e r k e l e y , C a l i f o r n i a ~ y ~
J . J . R o b i n s o n , "DIAGRAH: A G r a ~ a a r f o r D i a l o g u e s , " T e c h n i c a l Note 205, A r t i f i c i a l I n t e l l i g e n c e C e n t e r , SRI I n t s r n a t l o n a l Menlo P a r k , C a l i f o r n i a ( F e b r u a r y 1 9 8 0 ) .
H. Tennant, ' ~ x p e r i e n c e w i t h the E v a l u a t i o n o f
N a t u r a l Language Q u e s t i o n A n s w e r e r s , " Proc%
S i x t h I n t e r n a t i o n a l J o i n t Conference on
A r t i f i c i a l I n t e l l i g e n c e , Tokyo, Japan (August
1979)o
F. g . Thompson and B. H. Thompson, " P r a c t i c a l
N a t u r a l Language P r o c e s s i n g : The REL S y s t e m a s P r o t o t y p e , " p p . 109-168, M. R u b l n o f f and M. C. ¥ o v l t s , a d s . , A d v a n c e s I n . C o m p u t e r s 1 3 (Academic P r e s s , New ¥ o ~ , 1 9 7 5 ) .
D. W a l t z , " N a t u r a l Language A c c e s s t o a L a r g e D a t a B a s e : An E n g i n e e r i n g A p p r o a c h , " P r o c . 4 t h . I n t e r n a t i o n a l J o i n t C o n f e r e n c e on A r t i f i c i a l
I n t e l l i g e n c e , T b i l i s i , USSR, pp. 868-872
(September 1975).
e-° *,.4
m
~ ^
z
" ®
~ ~
~
w-~ ¢: • m *" o
. ~ .~ , ~ ..~
. , - * V
,
. ~
~ ~ ' ; ~
~
~.~ ,~'~
~ ~ . ~ ~ ~ ~~. ~
~
. - - - - _ - - - - _ _ - - - ~ ~ , ~ A ~ ~ , ~ ^
z
t ~
: ~ ~ ~
~ ^
:~
os., ~ w
m U =
=~ <.=
=
F- :3 m:
=
~ 0 ~
,-,
~
^ L
u ~ a -
= ~ "
<
<
= ~
~
• J ~ .
A °
= ~
a N
° ~
u~
0 0 C "-"
o
=
: ~
~
= : ,m
o"
"
!
" ~ = ~ ~,
÷ +
= ~
~
_=
Z = ' ~ .
=o
164
" ~ w ZZ
~ • 0
41 ~ ~p
a
: = ~
o -
"
8
I
~ S X ~
~
~ g ~ -..
. , m , ~ ~ ~ , , - I IU
u,~ .,c
m
k
~=.
k..
m
4~ =
~ o
~
2 Z X:
4c
, . I
Z
C M ~ E~
~ J • ° . ~ 4 t
,-44~
G Ic
L :
~ 4 t
t~ *a .,=4,-4