• No results found

Collocational Grammar as a Model for Human Computer Interaction

N/A
N/A
Protected

Academic year: 2020

Share "Collocational Grammar as a Model for Human Computer Interaction"

Copied!
5
0
0

Loading.... (view fulltext now)

Full text

(1)

COLLOCATIONAL GRAMMAR AS A MODEL FOR H I NAN-COMPUTER INTERACTION

W. R a n d o l p h F o r d P r i s m A s s o c i a t e s

• 7402 York Road, S u i t s 301 Towson, M a r y l a n d 21204

l ~ , c u l N. S m i t h

GTE L a b o r a t o r i e s , I n c . 40 S a l v a n Road

Waltham, M a s s a c h u s e t t s 02254

C o n t r a r y t o t h e l o n g - h e l d b e l i e f o f t r a n s f o ~ n a t i o n a l E ~ m m a r i a n s f o r c o m m u n i c a t i o n i n g e n e r a l , t h e m a j o r i t y c f natural languaEe sentences which people actually use in oom- m u n i c a t i n g w i t h a c o ~ p u t e r , i n a n u n c o n s t r a i n e d mode, a r e n o t n o v e l . As Thompson and Thompson 1981s660 o b s e l w e z " m o n o t o n y o f s t r u c t u r e i s t h e r u l e r a t h e ~ t h a n t h e e x c e p t i o n i n homan- - c o m p u t e r c o m m u n i c a t i o n . " Thompson 1981"41 r e p o r t s i n h e r study of such communications that 75 percent of the queries w e r e w h - q u e s t i o n s , 1% p e r c e n t w e r e commands, 5 p e r c e n t w e r e s t a t e m e n t s , and 1 p e r c e n t w e r e y e s / n o q u e s t i o n s .

The r e p e t i t i v e f e a t u r e o f n a t u r a l l a n g u a g e i s n o t a new c o n c e p t . S i m i l a r o b s e r v a t i o n s h a v e b e e n made b s f o r e o Damerau 1971 u s e d c o l l o c a t i o n o f l e x i o a l i t e m s a s t h e b a s i s f o r a Marker m o d e l i n a n e x p e r i m e n t f o r t e x t g e n e r a t i o n . B e c k e t c l a i m s t h a t " t h e w o n d e r f u l f e a t s o f t h e homan i n t e l l e c t . . , a r e b a s e d a s much on m e m o r i z a t i o n a s on a n y impromptu p r o b l e m - -solving" (1975:62). He posits a phrasal lexicon oonslstln~ of six major categories of lexioal phrases by which we "stitch t c g e t h e r s w a t c h e s o f t e x t t h a t we h a v e h e a r ~ b e f o r e | p r o d u c t - i v e p r o c e s s e s h a v e t h e s e c o n d a r y r o l e o f a d a p t i n g t h e o l d phrases to the new situations" (1975z60)o

(2)

c o - o c c u r . T h i s s u r f a c e c o - o c c u r r e n c e i s t h e r e s u l t o f what may a t t i m e s be c o m p l i c a t e d s y n t a c t i c and s e m a n t i c i n t e r r e l a t -

i o n s o f l a n g u a g e u n i t e . U n f o r t u n a t e l y , a s y s t e m a t i c a c c o u n t - i n g o f t h e s e i n t e r r e l a t i o n s h a s n o t b e e n a c h i e v e d i n any l i n g u i s t i c t h e o r y . The t h r u s t o f o u r a p p r o a c h i s t h a t t h e more o f l a n g u a g e w h i c h c a n be h a n d l e d l e x i c a l ! y , t h e e a s i e r w i l l homan l a n g u a g e be a b l e t o be m o d e l l e d .

A c t u a l d a t a on t h e f r e q u e n c y o f l e x i c a l c o l l o c a t i o n a r e v e r y s p a r e s . A s t u d y o f word s e q u e n c e s i n PANALOG t e x t h a s shown a s u r p r i s i n g amount o f r e p e t i t i o n o f word s e q u e n c e s (Bienstock and Smith in preparation). (PANALOG is a system for passing messages among small ~ o u p s in computer confer- enclng with telemall and calendar features. See Housman 1979. The data are of human-homan communication and not homan- -computer cc~.,nunication and are thematically restricted. They t h e r e f o r e r e s e m b l e D a m e r a u ' s d a t a . ) A s t u d y o f p a r t s o f t h e Brown E n g l i s h Corpus has b e e n u n d e r t a k e n i n o r d e r t o g e t l e s s t h e m a t i c a l l y homogeneous m a t e r i a l . I n a d d i t i o n , W i z a r d o f OZ e x p e r i m e n t s w i t h u n c o n s t r a i n e d h , - . a n - c o m p u t e r i n p u t w i l l b e - g i n s o o n a t GTE L a b o r a t o r i e s i n o r d e r t o g a t h e r t h e more r e l e v a n t human-oomputeu~ d a t a .

A PANALOG t e x t o f 16,133 words c h o s e n f o r s t u d y . The l o n g e s t s t r i n g w h i c h o c c u r r e d more t h a n once was a s e v e n word q u o t e from N t e t z s c h e . Two s i x - w o r d s t r i n g s o c c u r r e d t w i c e and a t l e n g t h f i v e , one o c c u r r e d t h r e e t i m e s and t h i r t e e n w e r e r e p e a t e d t w i c e .

(3)

No. of HAPAX (x .1o')

12- ii- i0, 9'

8.

?,

6- 5- 4- 3- 2- i- 0

2 3 4 5 6 7 8

Figure i. String Length

T h i s i s a r e v e a l l ~ m e a s u r e o f t h e Amount o f r e p e t i t i o n i n a t e x t o f t h i s l e n g t h . I n p a r t i c u l a r , r e c u r r i n g two word s t r i n g s a c c o u n t f o r 4 0 . 8 p e r c e n t o f t h e r u n n i n g t e x t a n d r e - c u ~ r i n g t h r e e word s t r i n g s c o m p r i s e 8 . 1 p e r c e n t o f t h e t e x t .

The b a s i c a s s u m p t i o n of t h e f r e q u e n t o c c u r r e n c e o f l e x i c - a l c o l l o c a t i o n i n n a t u l - a l l a n g u a g e t e x t s , e s p e c i a l l y i n

h , - , a n - o c m p u t e r c o m m u n i c a t i o n , i s t h e b a s i s f o r t h e d e v e l o p m e n t o f a new t y p e of n a t u r a l l a n g u a g e p r o c e s s o r . F o r d , 1981, h a s c o n s t r u c t e d a n a t u r a l l a n g u a E e p r o c e s s i n g s y s t e m f o r d a t a b a s e u p d a t i n g , r e t r i e v a l , and m a n i p u l a t i o n t which r e l i e s c r i t i c a l l y on t h e o b s e r v a t i o n t h a t r e a l u s e r s t e n d t o employ a v e r y l i m i t e d s e t .of l e x i c a l d t r i ~ t y p e s i n q u e r y i ~ d a t a b a s e s .

The F o r d n a t u r a l l a n g u a g e p r o c e s s o r c o n s i s t s o f a two s t a g e r e d u c t i o n a l g o r i t h m f o r ~ r a n s l a t i n g n a t u r a l l a n g u a E s i n - p u t s i n t o b a s i c f u n c t i o n s which a r e t h e n u s e d t o p e r f o r m t h e q u e r y . The f i r s t s t a g s o f t h e r e d u c t i o n c h a n g e s t h e i n p u t words t o m e a n i n g r e p r e s e n t a t i o n s u s i n E a l i s t o f l e x i o a l i t e m s a n d a m e a n i n g c o r r e l a t e l i s t . The s e c o n d s t a g e t a k e s a s i n p u t s t r i n g s of t h e s e m e a n i n g c o r r e l a t e s and o h a ~ e s them i n t o b a s i c l l s t .

409 n u m e r i c r e p r e s e n t a t i o n s f o r words mapped down t o 132

[image:3.612.161.443.184.317.2]
(4)

t o 19 f u n c t i o n s . T h i s two s t a g e r e d u c t i o n scheme worked e f f i c i e n t l y enough t o r e s p o n d t o 9 3 . 8 p e r c e n t o f t h e

1697

i n - p u t q u e r i e s , i n c l u d i n g u n g r a m m a t i c a l ones from i n e x p e r i e n c e d u s e r s , w i t h a r e s p o n s e t i m e o f 1.5 s e c o n d s , o p e r a t i n g i n a n e n v i r o n m e n t o f 90K 8 - b i t b y t e s . T h i s compares v e r y f a v o r a b l y t o Thompson 1981 where o n l y 6 7 . 7 p e r c e n t o f REL q u e r i e s were c o r r e c t l y p a r s e d w i t h a n a v e r a g e r e s p o n s e t i m e o f 10 s e c o n d s . (Space r e q u i r e m e n t s were n o t r e p o r t e d . ) S i m i l a r l y , Damerau 1981 and P a t r i c k 1981 r e p o r t a s u c c e s s r a t e f o r TQA o f 6 5 . 1 p e r c e n t i n p u t s c o r r e c t l y p a r s e d w i t h t h e t i m e r e q u i r e d t o p r o c e s s a s e n t e n c e t y p i c a l l y b e i n g 10 s e c o n d s .

The r e a s o n why t h e s y s t e m works so w e l l i n t e r m s o f a c c u r a c y , s p e e d , and s m a l l s t o r a g e r e q a i r e m e n t s i s b a s e d on t h e two s t a g s r e d u c t i o n t e c h n i q u e w h i c h , i n t ~ r n , i s b a s e d on t h e f a c t t h a t a g r e a t ma~v i n p u t s i n h u m a n - c o m p u t e r communic- a t i o n

are repetitious syntactically, semantically, and lexl-

~ally. Repetition is a principal characteristic of human-

-computer

c o m m u n i c a t i o n .

REFERENCES

Becket, Joseph, 1975. "The Phrasal Lexicon," in Schank, Roger

a n d B o n n i e N a s h - W e b b e r , e d s . T h e o r e t i c a l I s s u e s i n N a t u r a l Language P r o c e s s i n g , ACL Workshop, Cambridge, HA, p p . 3 8 - 4 1 . B t e n s t o o k , D a n i e l a n d R a o u l H . ~ n i t h . I n p r e p a r a t i o n . " L e x i c a l

Collocation in Three Types of Texts."

Damerau, F r e d e r i c k J . 1971. Marker Models a n d L i n g u i s t i c

T h e o l . Janua linguarum series minor 95. Mouton: The Hague.

Dsmerau, Frederick J. 1981. "Operating Statistics for the

Transformational Question Answering System", AJCL 7.1. :30-42

F o r d , W. R a n d o l p h . 1981. " N a t u r a l - L a n g u a g e P r o c e s s i n g by

Cemputer - A New A p p r o a c h " , U n p u b l i s h e d P h . D . d i s e e r t a t i o n , The J o h n s Hopkins U n i v e r s i t y , B a l t i m o r e , H a r y l a n d ,

(5)

P e t r i o k , S t a n l e y . 1981. " F i e l d T e s t i n g t h e T r a n s f o r m a t i o n a l

Q u e s t i o n Answering (TQA) S y s t e m , "

P r o c e e d i n g s o f t h e N i n e -

t e e n t h Annual M e e t i n 6 o f t h e A s s o c i a t i o n f o r C o m p u t a t i o n a l

~

,

pP.

35-36.

Reiger,

Chuck.

1977. "Viewing Parsing as

Word

Sense Discrimin-

ation," in Dinswall, Williem O. A Surve~ of Linguistic

Science, Greylock Publishers.

Small, Steven. 1980. "Word Expert Parsing" A Theory of

D i s t r i b u t e d Word-Based N a t u r a l Language U n d e r s t a n d i n g , "

University of Maryland Computer Science TR-954.

Thompson, Bo~ena H. 1981. "Evaluation of Ratural Language

I n t e r f a c e s t o Data Base S y s t e m s , " Proceedin~m o f t h e

N i n e t e e n t h Annual Meeting o f t h e A s s o c i a t i o n f o r Computat-

ional Lin6uistics. Menlo Park, CA: Association for Canput-

atlonal Linguistics, pp. 39-42.

Thc|apson, Bo~ena H. and Frederick B. Thompson. 1981. "Shifting

to a Higher Gear in a Natural Language System," Proceedln~s

of the 1981 National Computer Conference Arlington, VA:

Figure

Figure i. String Length

References

Related documents

neighborhood as a NATIONAL HISTORIC PARK will attract more than 300,000 visitors each year, create 350 jobs annually, $15 million in annual wages, and sustain $40 million

Figures from the Norwegian Health Surveys indicate that there has been an increase in both the proportion of people who drink alcohol (the proportion who have drunk

As a result of the numerous changes to California’ s juvenile justice system, two practices and circumstances have important impacts on the budget – a significant decrease in the

The SkyServer is a web site and SQL database containing the Sloan Digital Sky Survey early data release (about 14 million celestial objects and 50 thousand spectra.) It is

This study will provide valuable data about patients ’ , family and professional caregivers ’ experiences with vari- ous IPC initiatives, including quality of care, quality of

Identification of weevil and fungal pathogen associated with sweet potato diseases is important to help in control strategy to avoid epidemic diseases that may cause loss of

The condensed consolidated interim financial information does not include all the necessary information for preparing financial statements for a full accounting year and

The captured data using the data collector explained in the last section, despite the potential demonstrated there, is not so useful if visualization tools are not developed in order