• No results found

Transportable Natural Language Interfaces to Databases

N/A
N/A
Protected

Academic year: 2020

Share "Transportable Natural Language Interfaces to Databases"

Copied!
8
0
0

Loading.... (view fulltext now)

Full text

(1)

TRANSPORTABLE NATURAL-LANGUAGE INTERFACES TO DATABASES by

Gary G. Hendrlx and William H. Lewis SRI International

333 Ravenewood Avenue Menlo Park, California 94025

I INTRODUCTION

Over the last few years a number of

application systems have been constructed that a l l o w u s e r s t o a c c e s s d a t a b a s e s by p o s i n g q u e s t i o n s i n n a t u r a l l a n g u a g e s , such a s E n g l i s h . When used i n t h e r e s t r i c t e d domains f o r which t h e y have been e s p e c i a l l y d e s i g n e d , t h e s e s y s t e m s have a c h i e v e d r e a s o n a b l y h i g h l e v e l s of p e r f o r m a n c e . Such s y s t e m s a s LADDER [ 2 ] , PLANES [ 1 0 ] , ROBOT [ 1 ] , and REL [9] r e q u i r e t h e e n c o d i n g of knowledge a b o u t t h e domain o f a p p l i c a t i o n in s u c h c o n s t r u c t s as d a t a b a s e s c h e m a t a , l e x l c o n s , p r a g n m t i c grammars, and t h e l l k e . The c r e a t i o n of t h e s e d a t a s t r u c t u r e s t y p i c a l l y r e q u i r e s c o n s i d e r a b l e e f f o r t on t h e p a r t o f a c o m p u t e r p r o f e s s i o n a l who h a s had special training in computational l i n g u i s t i c s and

the use of databases. Thus, the utility of these

systems is severely limited by the high cost

involved in developing an interface to any

particular database.

T h i s p a p e r d e s c r i b e s i n i t i a l work on a m e t h o d o l o g y f o r c r e a t i n g n a t u r a l - l a n g u a g e p r o c e s s i n g c a p a b i l i t i e s f o r new domains w i t h o u t t h e need f o r i n t e r v e n t i o n by s p e c i a l l y t r a i n e d e x p e r t s . Our a p p r o a c h i s t o a c q u i r e l o g i c a l s c h e m a t a and l e x i c a l i n f o r m a t i o n t h r o u g h s i m p l e i n t e r a c t i v e d i a l o g u e s w i t h someone who i s f a m i l i a r w i t h t h e form and c o n t e n t of t h e d a t a b a s e , b u t u n f a m i l i a r w i t h t h e t e c h n o l o g y of n a t u r a l - l a n g u a g e i n t e r f a c e s . To t e s t o u r a p p r o a c h i n an a c t u a l c o m p u t e r e n v i r o n m e n t , we have d e v e l o p e d a p r o t o t y p e s y s t e m c a l l e d TED ( T r a n s p o r t a b l e E n g l i s h D a t a m a n a g e r ) . As a r e s u l t o f o u r e x p e r i e n c e w i t h TED. t h e NL group a t SRI i s now u n d e r t a k i n g t h e d e v e l o p = a n t o f a ~ c h more a m b i t i o u s s y s t e m b a s e d on t h e s a n e p h i l o s o p h y

[ 4 ] .

I I RESEARCH PROBLEMS

Given t h e d e m o n s t r a t e d f e a s i b i l i t y o f l a n g u a g e - a c c e s s s y s t e m s , s u c h as LADDER, m a j o r r e s e a r c h i s s u e s t o be d e a l t w i t h i n a c h i e v i n g t r a n s p o r t a b l e d a t a b a s e i n t e r f a c e s i n c l u d e t h e f o l l o w i n g :

* I n f o r m a t i o n used by t r a n s p o r t a b l e s y s t e m s must be c l e a n l y d i v i d e d i n t o d a t a b a s e - i n d e p e n d e n t and d a t a b a s e - d e p e n d e n t p o r t i o n s .

* Knowledge r e p r e s e n t a t i o n s must be e s t a b l i s h e d f o r t h e d a t a b a s e - d e p e n d e n t p a r t i n s u c h a way t h a t t h e i r form i s f i x e d and a p p l i c a b l e t o a l l d a t a b a s e s and t h e i r c o n t e n t r e a d i l y a c q u i r a b l e .

* Mechanisms must be d e v e l o p e d t o e n a b l e t h e s y s t e m t o a c q u i r e information a b o u t a p a r t i c u l a r a p p l i c a t i o n f r o m n o n l i n g u i s t s .

I I I THE TED PROTOTYPE

We have d e v e l o p e d o u r p r o t o t y p e s y s t e m (TED) t o e x p l o r e one p o s s i b l e a p p r o a c h to c h a s e p r o b l e m s . In e s s e n c e , TED i s a LADDER-like n a t u r a l - l a n g u a g e p r o c e s s i n g s y s t e m f o r a c c e s s i n g d a t a b a s e s , combined w i t h an " a u t o m a t e d i n t e r f a c e e x p e r t " t h a t

interviews users to learn the language and logical

s t r u c t u r e a s s o c i a t e d w i t h a p a r t i c u l a r database and t h a t a u t o m a t i c a l l y t a i l o r s t h e s y s t e m f o r u s e w i t h t h e p a r t i c u l a r a p p l i c a t i o n . TED a l l o w s u s e r s t o c r e a t e , p o p u l a t e , and e d i t ~ h e i r own new l o c a l d a t a b a s e s , t o d e s c r i b e e x i s t i n g l o c a l d a t a b a s e s , o r e v e n t o d e s c r i b e and s u b s e q u e n t l y a c c e s s h e t e r o g e n e o u s ( a s i n [ 5 ] ) d i s t r i b u t e d d a t a b a s e s .

Most of TED i s b a s e d on and b u i l t from components of LADDER. In p a r t i c u l a r , TED u s e s t h e LIFER p a r s e r and i t s a s s o c i a t e d s u p p o r t p a c k a g e s [ 3 ] , t h e SODA d a t a a c c e s s p l a n n e r [ 5 ] , and t h e FAM f i l e a c c e s s manager [ 6 ] . A l l of t h e s e s u p p o r t p a c k a g e s a r e i n d e p e n d e n t o f t h e p a r t i c u l a r d a t a b a s e u s e d . In LADDER, t h e d a t a s t r u c t u r e s u s e d by t h e s e components ~ r e h a n d - g e n e r a t e d f o r s p a r t i c u l a r d a t a b a s e by c o m p u t e r s c i e n t i s t s . I n TED, however, t h e y a r e c r e a t e d by TED's a u t o m a t e d i n t e r f a c e e x p e r t .

L i k e LADDER, TED u s e s a p r a g m a t i c g r a n m a r ; b u t TED's p r a g m a t i c gramemr does n o t make any a s s t m p t l o n s a b o u t t h e p a r t i c u l a r d a t a b a s e being a c c e s s e d . I t assumes o n l y t h a t i n t e r a c t i o n s w i t h t h e s y s t e m w i l l c o n c e r n d a t a a c c e s s o r u p d a t e , and t h a t i n f o r m a t i o n r e g a r d i n g t h e p a r t i c u l a r d a t a b a s e w i l l be encoded i n d a t a s t r u c t u r e s o f a p r e s c r i b e d form, which a r e c r e a t e d by t h e a u t o m a t e d i n t e r f a c e e x p e r t .

The e x e c u t i v e l e v e l of TED a c c e p t s t h r e e k i n d s o f i n p u t : q u e s t i o n s s t a t e d i n E n g l i s h a b o u t t h e d a t a i n f i l e s t h a t h a v e been p r e v i o u s l y d e s c r i b e d t o t h e s y s t e m ; q u e s t i o n s p o s e d i n t h e SODA q u e r y l a n g u a g e ; s i n g l e - ~ o r d commands that ~nltlaCe d i a l o g u e s w i t h t h e a u t o m a t e d i n t e r f a c e e x p e r t .

zv THE * . T a ~ A ~ I ~ r ~ F A C ~ ) X ~ R T

A. P h i l o s o p h 7

(2)

* There m u s t be a r a n g e o f r e a d i l y u n d e r s t o o d q u e s t i o n s t h a t e l i c i t a l l t h e i n f o r m a t i o n n e e d e d a b o u t a n e w d a t a b a s e .

* The q u e s t i o n s m u s t b e b o t h b r i e f a n d e a s y to u n d e r s t a n d .

* The s y s t e m m u s t a p p e a r c o h e r e n t , ellciting r e q u i r e d information in an o r d e r c o m f o r t a b l e t o t h e u s e r .

* The s y s t e m m u s t p r o v i d e s u b s t a n t i a l a s s i s t a n c e , w h e n n e e d e d , t o e n a b l e a u s e r t o u n d e r s t a n d t h e k i n d s o f r e s p o n s e s t h a t a r e e x p e c t e d .

A l l t h e s e p o i n t s c a n n o t b e c o v e r e d h e r e i n , b u t t h e s a m p l e t r a n s c r i p t s h o w n a t t h e e n d o f t h i s p a p e r t

in conjunction with the following discussion,

s u g g e s t s t h e m a n n e r o f o u r a p p r o a c h .

B. S t r a t e g y

A k e y s t r a t e S y o f TED i s t o f i r s t a c q u i r e

information a b o u t the structure of files. Because

t h e semantics of files is relatively well

understoodt t h e system thereby lays the foundation

for subsequently a c q u i r i n g information a b o u t t h e

linguistic constructions likely t o be used i n

questions about the data contained in the file. One o f t h e s i n g l e - w o r d c o - - - - n d s a c c e p t e d b y t h e TED e x e c u t i v e s y s t e m i s t h e command NEW, w h i c h i n i t i a t e s a d i a l o g u e p r o m p t i n g t h e u s e r t o s u p p l y i n f o r m a t i o n a b o u t t h e s t r u c t u r e o f a new d a t a f i l e . T h e NEW d i a l o g u e a l l o w s t h e u s e r t o t h i n k o f t h e f i l e a s a t a b l e o f i n f o r m a t i o n a n d a s k s r e l a t i v e l y s i m p l e q u e s t i o n s a b o u t e a c h o f t h e f i e l d s ( c o l u m n s ) in t h e file ( t a b l e ) .

For example, TED asks for the heading names of

the columns, for possible synonyms for t h e heading

n a m e s , a n d f o r i n f o r m a t i o n a b o u t t h e t y p e s o f v a l u e s ( n u m e r i c , B o o l e a n , o r s y m b o l i c ) t h a t e a c h c o l u m n c a n c o n t a i n . T h e h e a d i n g n a m e s g e n e r a l l y a c t l i k e r e l a t i o n a l n o u n s , w h i l e t h e i n f o r m a t i o n a b o u t t h e t y p e o f v a l u e s i n e a c h c o l u m n p r o v i d e s a

clue to the column's semantics. The heading name

of a symbolic column tends to he the generic name f o r t h e c l a s s o f o b j e c t s r e f e r r e d t o b y t h e v a l u e s o f t h a t c o l u m n . H e a d i n g n a m e s f o r B o o l e a n c o l u m n s t e n d c o b e t h e n a m e s o f p r o p e r t i e s t h a t d a t a b a s e o b j e c t s can possess. T.f a c o l u m n contains numbers,

thls suggests that there may be some scale wlth

a s s o c i a t e d a d j e c t i v e s of d e g r e e . To a l l o w t h e s y s t e m t o a n s w e r q u e s t i o n s r e q u i r i n g t h e i n t e g r a t i o n of information from m u l t i p l e files, t h e u s e r i s a l s o a s k e d a b o u t t h e i n t e r c o n n e c t i o n s b e t w e e n t h e f i l e c u r r e n t l y b e i n g d e f i n e d a n d o t h e r f i l e s d e s c r i b e d p r e v i o u s l y .

C. E x a m p l e s f r o m a T r a n s c r i p t

I n t h e s a m p l e t r a n s c r i p t a t t h e e n d o f t h i s p a p e r , t h e u s e r i n i t i a t e s a NEW d i a l o g u e a t P o i n t A . The a u t o m a t e d i n t e r f a c e e x p e r t t h e n t a k e s t h e i n i t i a t i v e i n t h e c o n v e r s a t i o n , a s k i n g f i r s t f o r t h e name o f t h e new f i l e , t h e n f o r t h e n a m e s o f t h e

file's fields. The file name wlll be u s e d t o

dlstlngulsh the new file from others during the

acquisition process. The field names are entered

into the lexicon as the names of attributes and are

put on an agenda so that further questions about

the fields may be asked subsequently of the user.

At this point, TED still does not know what

type of objects the data in the new file concern.

Thus, as its next task, TED asks for words that

might be used as generic names for the subjects of

the file. Then, at Point E, TED acquires

Information about how to identify one of these s u b j e c t s c o t h e u s e r a n d , a t P o i n t F , d e t e r m i n e s w h a t k i n d s o f p r o n o u n s m i g h t be u s e d t o r e f e r t o o n e o f t h e s u b j e c t s . (As r e g a r d s s h i p s , TED i s f o o l e d , b e c a u s e s h i p s may b e r e f e r r e d t o b y " s h e . " )

TED i s p r o g r a - , ~ e d wlch the knowledge t h a t the identifier of an object must be some kind of name, r a t h e r t h a n a numeric q u a n t i t y o r B o o l e a n v a l u e .

Thus, TED can assume a priori that the NAME field

given in Interaction E is symbolic in nature. At

P o i n t G, TED acquires p o s s i b l e s y n o n y m s f o r NAME. TED t h e n c y c l e s t h r o u g h a l l t h e o t h e r f i e l d s , a c q u i r i n g i n f o r m a t i o n a b o u t t h e i r i n d i v i d u a l s e m a n t i c s . At P o i n t H, TED a s k s a b o u t t h e CLASS f i e l d , b u t t h e u s e r d o e s n ' t u n d e r s t a n d t h e q u e s t i o n . By t y p i n g a q u e s t i o n e u ' r k , t h e u s e r c a u s e s TED t o g i v e a m o r e d e t a i l e d e x p l a n a t i o n o f w h a t i t n e e d s . E v e r y q u e s t i o n TED a s k s h a s a t l e a s t t w o l e v e l s o f e x p l a n a t i o n t h a t a u s e r may c a l l u p o n f o r c l a r i f i c a t i o n . F o r e x a m p l e , t h e u s e r a g a i n h a s t r o u b l e a t J , w h e r e u p o n h e r e c e i v e s a n e x t e n d e d e x p l a n a t i o n w i t h a n e x a m p l e . S e e T a l s o .

D e p e n d i n g u p o n w h e t h e r a f i e l d i s s y m b o l i c , a r i t h n e t i c o r B o o l e a n , TED m a k e s d i f f e r e n t f o r m s o f e n t r i e s i n i t s l e x i c o n a n d s e e k s t o a c q u i r e d i f f e r e n t t y p e s o f i n f o r m a t i o n a b o u t t h e f i e l d . F o r e x a m p l e , a s a t P o i n t s J , K a n d ¥ , TED a s k s w h e t h e r symbolic field values can be used as m o d i f i e r s (usually i n n o u n - ~ o u n c o m b i n a t i o n s ) . F o r

a r i t h m e t i c f i e l d s , TED l o o k s f o r a d j e c t i v e s a s s o c i a t e d w i t h s c a l e s , a s i s i l l u s t r a t e d b y t h e s e q u e n c e 0PQR. O n c e TED h a s a w o r d s u c h a s OLD, i t a s s u m e s MORE OLD, OLDER a n d OLDEST may a l s o b e u s e d . (GOOD-BETTER-BEST r e q u i r e s special i n t e r v e n t i o n . )

N o t e t h e a g g r e s s i v e u s e of p r e v i o u s l y a c q u i r e d information i n formulating new q u e s t i o n s t o t h e user (as in the use of AGE, and SHIP at Point P).

We have found that this aids considerably in

k e e p i n g t h e u s e r f o c u s e d o n t h e c u r r e n t i t e m s o f i n t e r e s t c o t h e s y s t e m a n d h e l p s t o k e e p i n t e r a c t i o n s b r i e f .

O n c e TED h a s a c q u i r e d local i n f o r m a t i o n a b o u t a new f i l e , i t s e e k s t o r e l a t e i t t o a l l known files, including t h e new file itself. A t P o i n t s Z t h r o u g h B+, TED d i s c o v e r s c h a t the * S H I P * file may

be Joined w i t h itself. That is, one of the

attrlbutes of a ship is yet another ship (the

escorted shlp)j which may itself be described in

the same file. The need for this information is

i l l u s t r a t e d b y t h e q u e r y t h e u s e r p o s e s a t P o i n t G+.

TO b e t t e r i l l u s t r a t e l i n k a g e s b e t w e e n f i l e s , t h e t r a n s c r i p t i n c l u d e s t h e a c q u i s i t i o n o f a s e c o n d

file about ship classes, beginnlng at Point J + .

Much of thls dialogue is omitted b u t , aC L÷s TED

l e a r n s t h e r e i s a l i n k b e t w e e n t h e * S H I P * a n d *CLASS* files. At /4+ it l e a r n s t h e d i r e c t i o n of

(3)

t h i s l i n k ; a t N+ and O+ i t l e a r n s t h e f i e l d s upon w h i c h t h e J o i n must be made; a t P+ it l e a r n s t h e a t t r i b u t e s i n h e r i t e d t h r o u g h t h e llnk. This i n f o r m a t i o n I s u s e d , f o r e x a m p l e , I n a n s w e r i n g t h e q u e r y a t S+. TED c o n v e r t s t h e u s e r ' s q u e s t i o n "What I s t h e s p e e d of t h e h o e l ? " i n t o ' ~ h a t i s t h e s p e e d of t h e c l a s s whose C N ~ i s e q u a l t o t h e CLASS of t h e h o e l ? . "

Of c o u r s e , t h e whole p u r p o s e o f t h e NEW d i a l o g u e s i s t o make i t p o s s i b l e f o r u s e r s t o a s k q u e s t i o n s of t h e i r d a t a b a s e s i n E n g l i s h . Examples o f E n g l i s h i n p u t s a c c e p t e d by TED a r e shown a t P o i n t s E+ t h r o u g h I + , and S+ and T+ I n t h e t r a n s c r i p t . Note t h e u s e of noun-noun c o m b i n a t i o n s , s u p e r l a t i v e s and a r i t h m e t i c . A l t h o u g h n o t i l l u s t r a t e d , TED a l s o s u p p o r t s a l l t h e

available LADDER facilities of ellipsis, spelling

c o r r e c t i o n , r u n - t i m e g r a m , ~ r e x t e n s i o n end i n t r o s p e c t i o n .

V THE PRACHATIC GRAMMAR

The p r a g m a t i c grammar u s e d by TED i n c l u d e s s p e c i a l s y n t a c t i c / s e m a n t i c c a t e g o r i e s t h a t a r e a c q u i r e d by t h e NEW d i a l o g u e s . In o u r a c t u a l i m p l e m e n t a t i o n , t h e s e h a v e r a t h e r awkward names, b u t t h e y c o r r e s p o n d a p p r o x / m a c e l y t o t h e f o l l o w i n g : * <GENERIC> i s t h e c a t e g o r y f o r t h e g e n e r i c names of t h e o b j e c t s i n f i l e s . L e x l c a l p r o p e r t i e s f o r t h i s c a t e g o r y i n c l u d e t h e name of t h e r e l e v a n t f i l e ( s ) and t h e names of t h e f i e l d s t h a t c a n be u s e d Co i d e n t i f y one of t h e o b j e c t s t o t h e u s e r . See t r a n s c r i p t P o i n t s D and E.

* <ID.VALUE> is the category for the

i d e n t i f i e r s of s u b j e c t s o f i n d i v i d u a l r e c o r d s ( i . e . , k e y - f i e l d v a l u e s ) . For e x a m p l e , f o r t h e *SHIP* f i l e , i t c o n t a i n s t h e v a l u e s of t h e NAME f i e l d . See t r a n s c r i p t P o i n t E.

* <MOD.VALUE> i s the category for the v a l u e s

of database fields that can serve as

m o d i f i e r s . See P o i n t s J and K.

* <NUM.ATTP.>, <SYM.ATTR>, and <BOOL.ATTP.> a r e n , - - e r i c , s y m b o l i c and B o o l e a n a t t r i b u t e s , r e s p e c t i v e l y . They i n c l u d e t h e names of a l l d a t a b a s e f i e l d s and t h e i r synonyms. * <+NUM.ADJ> i s t h e c a t e g o r y f o r a d j e c t i v e s

( e . g . OLD) a s s o c i a t e d with n u m e r i c f i e l d s . L e x l c a l p r o p e r t i e s i n c l u d e t h e name of t h e a s s o c i a t e d f i e l d and f l i e s , a s v e i l a s i n f o r m a t i o n r e g a r d i n g w h e t h e r t h e a d j e c t i v e i s a s s o c i a t e d w i t h g r e a t e r ( a s I n OLD) o r l e s s e r ( a s i n YOUNG) v a l u e s i n t h e f i e l d . See P o i n t s P, Q and R.

* <COMP.ADJ> and <SUPERLATIVE> a r e d e r i v e d f r o = <+NUM.ADJ>.

Shown below a r e some i l l u s t r a t i v e p r a g m a t i c p r o d u c t i o n r u l e s f o r n o n l e x l c a l c a t e g o r i e s . As i n t h e f o r e g o i n g e x a m p l e s , t h e s e a r e n o t e x a c t l y t h e r u l e s u s e d by TED, b u t t h e y do c o n v e y t h e unCure of t h e a p p r o a c h .

<S> -> <PRESENT> THE <ATTP.> OF <ITEM> what is the age of the reeves HOW <+NUM.ADJ> <BE> <ITEM>

how o l d i s the y o u n g e s t s h i p <WHDET> <ITEM> <HAVE> <FEATURE>

what l e a h y s h i p s have a d o c t o r <WHDET> <ITEM> <BE> <COMPLEMENT>

which s h i p s a r e o l d e r t h e n r e e v e s <PRESENT> -> WHAT <BE>

PRINT <ATrR> -> <NUM.ATTR>

<SYM.ATTR> <BOOL.ATTK> <ITEM> -> <GENERIC>

s h i p s <ID.VALUE>

r e e v e s THE <ITEM>

the oldest shlp <MOD.VALUE> <ITEM>

leahy ships <SUPERLATIVE> <ITEM>

f a s t e s t s h i p w i t h • d o c t o r <ITEM> <WITH> <FEATURE>

s h i p with a speed greater than 12 <FEATURE> -> <BOOL.ATTR>

d o c t o r / p o i s o n o u s

<NUN.ATTE> <NUM.COMP> <NUMBER>

age of 15

<NUM.ATTR.> <NUM.COMP> <ITEM> a g e g r e a t e r t h a n r e e v e s

<NUM.COMP> -> <COMP.ADJ> THAN OF

(GREATER> THAN

<COMPLEMENT> -> <COMP.A/kJ> THAN <ITEM> <COMP.ADJ> THAN <NUMBER>

These p r a g m a t i c E r a - m a r r u l e s a r e v e r y much l i k e t h e ones u s e d in LADDER [ 2 ] , b u t t h e y d i f f e r from t h o s e of LADDER i n two c r i t i c a l w a y s .

(1) They c a p t u r e t h e p r a g m a t i c s o f a c c e s s i n g d a t a b a s e s w i t h o u t f o r c i b l y £ncludin8 i n f o r m a t i o n a b o u t t h e p r a S m a t i c s of any one p a r t i c u l a r s e t of d a t a .

( 2 ) They use s ~ t s c t 4 ~ / s e m a n t i c c a t e g o r i e s t h a t s u p p o r t t h e p r o c e s s e s o f accessln8 d a t a b a s e s , b u t t h a t a r e d o m s i n - i n d e p e n d e n t and e a s i l y a c q u i r a b l e . I t i s w o r t h n o t i n g t h a t , e v e n when a p s r C l c u l a r a p p l i c a t i o n r e q u i r e s t h e i n t r o d u c t i o n o f S p e c i a l - p u r p o s e r u l e s , t h e b a s i c p r a g m a t l c grmamar u s e d by TED p r o v i d e s a s t a r t i n g p o i n t from w h l c h d o m a i n - s p e c i f i c f e a t u r e s c a n be a d d e d .

VI DIRECTIONS FOR FURTHER WORK

(4)

t o p r o v i d e adequate s y n t a c t i c and c o n c e p t u a l

coverage, as well as t o increase the ease with

which systems may be adapted to new databases. A severe limitation of the current TED system i s i t s r e s t r i c t e d r a n g e o f s y n t a c t i c c o v e r a g e . For e x a m p l e , TED d e a l s o n l y w i t h t h e v e r b s BE and HAVE, and d o e s n o t know a b o u t u n i t s ( e . g . , t h e W a d d e l ' s a g e i s 1 5 . 5 , n o t 15.5 YEARS). To remove t h i s l i m i t a t i o n , t h e SRI NL g r o u p i s c u r r e n t l y a d a p t i n g J a n e R o b i n s o n ' s e x t e n s i v e DIAGRAM grammar {7] f o r u s e i n a s u c c e s s o r Co TED. I n p r e p a r a t i o n f o r t h e l a t t e r , we a r e e x p e r i m e n t i n g w i t h v e r b a c q u i s i t i o n dialogues such as the following:

> VERB

P l e a s e c o n j u g a t e t h e v e r b

(e.g. fly flew flown) > EARN EARNED EARNED EARN is:

1 i n t r a n s i t i v e (John d i n e s ) 2 t r a n s i t i v e ( J o h n e a t s d i n n e r )

3 d i c r a n s i t i v e ( J o h n c o o k s Mary d i n n e r ) (Choose t h e most g e n e r a l p a t t e r n ) > 2

who or what is EARNED? > A SALARY

w h o or what EARNS A SALARY? > AN EMPLOYEE can A SALARY be EARNED by AN EMPLOYEE? > YES c a n A SALARY EARN? > NO

can AN ~dPLOYEE EARN? > NO

Ok:, an EMPLOYEE can EARN a SALARY

What database field identifies an EMPLOYEE? > NAME

What database field identifies a SALARY? > SALARY

extensive conceptual and symtacclc coverage

continues to pose a challenge to research, a

polished version of t h e TED p r o t o t y p e , e v e n with i t s limited coverage, would appear to have high potential as a useful tool for data access.

KEFER£NCES

1. L . R . H a r r i s , " U s e r O r i e n t e d D a t a Base Query w i t h t h e ROBOT N a t u r a l Language Query S y s t e m , " P r o c . T h i r d I n t e r n a t i o n a l C o n f e r e n c e o.~n Vet [ L a r g e Data B a s e s ; Tokyo ( O c t o b e r 1 9 7 7 ) . 2. G . G . H e n d r i x , E. D. S e c e r d o t i , D. S a g a l o w i c z ,

a n d J . Slocum, " D e v e l o p i n g a N a t u r a l Language I n t e r f a c e t o Complex D a t a , " ACH T r a n s a c t i o n s on Database Systems , Vol. 3,--~. 2 (June

1978).

3. G . G . H e n d r i x , "Human E n g i n e e r i n g f o r A p p l i e d

N a t u r a l Language P r o c e s s i n g , " P r o c . 5 t h I n t e r n a t i o n a l J o i n t C o n f e r e n c e on A r t i f i c i a l

4.

5.

The g r e a t e s t c h a l l e n g e t o e x t e n d i n g s y s t e m s l i k e TED i s t o i n c r e a s e t h e i r c o n c e p t u a l c o v e r a g e . As p o i n t e d o u t by T e n n a n t [ 8 ] , umers who a r e

accorded n a t u r a l - l a n g u a g e access co a database 6.

e x p e c t n o t o n l y t o r e t r i e v e i n f o r m a t i o n d i r e c t l y s t o r e d t h e r e , b u t a l s o co compute " r e a s o n a b l e " d e r i v a t i v e i n f o r m a t i o n . For e x a m p l e , i f a d a t a b a s e

has the location of two ships, users will expect

t h e s y s t e m t o be a b l e t o p r o v i d e t h e d i s t a n c e b e t w e e n t h e m - - a n i t e m of i n f o r m a t i o n n o t d i r e c t l y 7. r e c o r d e d i n t h e d a t a b a s e , b u t e a s i l y computed from

the existing data. In general, any system that is

tO be w i d e l y accepted by u s e r s must n o t o n l y

provide access t o primary information, but uast

a l s o e n h a n c e t h e l a t t e r w i t h p r o c e d u r e s t h a t 8. c a l c u l a t e s e c o n d a r y a t t r i b u t e s from t h e d a t a a c t u a l l y s t o r e d . D a t a e n h a n c e m e n t p r o c e d u r e s a r e c u r r e n t l y p r o v i d e d by LADDER and a few o t h e r h a n d - b u i l t s y s t e m s , b u t work i s n e e d e d now t o d e v i s e means f o r a l l o w i n g s y s t e m u s e r s t o s p e c i f y t h e i r

own d a t a b a s e enhancement functions and to c o u p l e 9. t h e s e wlth the natural-language component.

A s e c o n d i s s u e a s s o c i a t e d w i t h c o n c e p t u a l

c o v e r a g e i s t h e a b i l i t y t o a c c e s s i n f o r m a t i o n e x t r i n s i c t o t h e d a t a b a s e p e r s e , s u c h a s where t h e d a t a a r e s t o r e d and how t h e f i e l d s a r e d e f i n e d , a s 10. well as information a b o u t t h e s t a t u s of t h e q u e r y system itself.

In summary, s y s t e m s s u c h a s LADDER a r e of

l i m i t e d u t i l i t y u n l e s s t h e y c a n be t r a n s p o r t e d t o new d a t a b a s e s by p e o p l e w i t h no s i g n i f i c a n t f o r m a l t r a i n i n g in c o m p u t e r s c i e n c e . A l t h o u g h t h e d e v e l o p m e n t of u s e r - s p e c i f i a b l e s y s t e m s w i t h

I n t e l l i g e n c e , Cambridge, Massachusetts (August

1977).

G. G. N e n d r i x , D. S a g a l o w l c z a n d E. D.

S a c e r d o t i , " R e s e a r c h on T r a n s p o r t a b l e E n g l i s h - A c c e s s H e d i a t o D i s t r i b u t e d and L o c a l Data B a s e s , " P r o p o s a l ECU 79-I03, A r t i f i c i a l

I n t e l l i g e n c e C e n t e r , SRI I n t e r n a t i o n a l , Menlo

P a r k , C a l i f o r n i a (November 1 9 7 9 ) .

R. C. Moore, " K a n d l i n g Complex Q u e r i e s i n a D i s t r i b u t e d D a t a E a s e , " T e c h n i c a l Note 170, A r t i f i c i a l I n t e l l i g e n c e C e n t e r , SRI

I n t e r n a t i o n a l Menlo P a r k , C a l i f o r n i a ( O c t o b e r 1979).

P. M o r r i s and V. S a g a l o w i c z , ' ~ l a n a g i n g Network A c c e s s t o a D i s t r i b u t e d D a t a B a s e , " P r o c . Second S e r k e l e ~ Workshop on D i s t r i b u t e d Data Hana6e~enc and Computer N e t w o r k s , g e r k e l e y , C a l i f o r n i a ~ y ~

J . J . R o b i n s o n , "DIAGRAH: A G r a ~ a a r f o r D i a l o g u e s , " T e c h n i c a l Note 205, A r t i f i c i a l I n t e l l i g e n c e C e n t e r , SRI I n t s r n a t l o n a l Menlo P a r k , C a l i f o r n i a ( F e b r u a r y 1 9 8 0 ) .

H. Tennant, ' ~ x p e r i e n c e w i t h the E v a l u a t i o n o f

N a t u r a l Language Q u e s t i o n A n s w e r e r s , " Proc%

S i x t h I n t e r n a t i o n a l J o i n t Conference on

A r t i f i c i a l I n t e l l i g e n c e , Tokyo, Japan (August

1979)o

F. g . Thompson and B. H. Thompson, " P r a c t i c a l

N a t u r a l Language P r o c e s s i n g : The REL S y s t e m a s P r o t o t y p e , " p p . 109-168, M. R u b l n o f f and M. C. ¥ o v l t s , a d s . , A d v a n c e s I n . C o m p u t e r s 1 3 (Academic P r e s s , New ¥ o ~ , 1 9 7 5 ) .

D. W a l t z , " N a t u r a l Language A c c e s s t o a L a r g e D a t a B a s e : An E n g i n e e r i n g A p p r o a c h , " P r o c . 4 t h . I n t e r n a t i o n a l J o i n t C o n f e r e n c e on A r t i f i c i a l

I n t e l l i g e n c e , T b i l i s i , USSR, pp. 868-872

(September 1975).

(5)

e-° *,.4

m

~ ^

z

" ®

~ ~

~

w-~ ¢: • m *" o

. ~ .~ , ~ ..~

. , - * V

,

. ~

~ ~ ' ; ~

~

~.~ ,~'~

~ ~ . ~ ~ ~ ~

~. ~

~

. - - - - _ - - - - _ _ - - - ~ ~ , ~ A ~ ~ , ~ ^

z

t ~

: ~ ~ ~

~ ^

:~

o

s., ~ w

(6)

m U =

=~ <.=

=

F- :3 m:

=

~ 0 ~

,-,

~

^ L

u ~ a -

= ~ "

<

<

= ~

~

• J ~ .

A °

= ~

a N

° ~

u~

0 0 C "-"

o

=

: ~

~

= : ,m

o"

"

!

" ~ = ~ ~,

÷ +

= ~

~

_=

Z = ' ~ .

=o

164

" ~ w ZZ

~ • 0

41 ~ ~p

a

: = ~

o -

(7)

"

8

I

~ S X ~

~

~ g ~ -..

. , m , ~ ~ ~ , , - I IU

u,~ .,c

m

k

~=.

k..

m

4~ =

~ o

~

2 Z X:

4c

, . I

Z

C M ~ E~

~ J • ° . ~ 4 t

,-44~

G Ic

L :

~ 4 t

t~ *a .,=4,-4

. . ~ . 5 ~

~

(8)

References

Related documents