I N T E R A C T I V E N A T U R A L L A N G U A G E P R O B L E M S O L V I N G : A P R A G M A T I C A P P R O A C H
* ** 8 a l l a r d * , **
A. B i e r m a n n , R. R o d m a n , B. T. B e t a n c o u r t ,
* * • F i n e m a n t
G. B i l b r o , H. Deas , L.
* * * H e i d l a g e *
P. F i n k , K. G i l b e r t , D. G r e g o r y , F. * D e p a r t m e n t of C o m p u t e r S c i e n c e
Duke U n i v e r s i t y D u r h a m , N o r t h C a r o l i n a ** D e p a r t m e n t of C o m p u t e r S c i e n c e
N o r t h C a r o l i n a S t a t e U n i v e r s i t y R a l e i g h , N o r t h C a r o l i n a
A B S T R A C T
I N T R O D U C T ION
A c l a s s of n a t u r a l l a n g u a g e p r o c e s - s o r s is d e s c r i b e d w h i c h a l l o w a user to d i s p l a y o b j e c t s of i n t e r e s t on a c o m p u t e r t e r m i n a l a n d m a n i p u l a t e t h e m via t y p e d or s p o k e n E n g l i s h s e n t e n c e s .
T h i s p a p e r c o n c e r n s i t s e l f w i t h the i m p l e m e n t a t i o n of the v o i c e i n p u t f a c i l i t y u s i n g an a u t o m a t i c s p e e c h r e c o g n i z e r , a n d t h e t o u c h input f a c i l i t y u s i n g a t o u c h s e n s i t i v e s c r e e n . T o o v e r c o m e the h i g h e r r o r r a t e s of the s p e e c h r e c o g n i z e r u n d e r c o n d i t i o n s of a c t u a l p r o b l e m s o l v i n g in n a t u r a l l a n g u a g e , e r r o r c o r r e c t i o n s o f t w a r e h a s b e e n d e s i g n e d and is d e s c r i b e d h e r e . A l s o d e s c r i b e d a r e p r o b - lems i n v o l v i n g the r e s o l u t i o n of v o i c e i n p u t w i t h t o u c h input, a n d the i d e n t i f i - c a t i o n of the i n t e n d e d r e f e r e n t s of t o u c h input.
T o m e a s u r ~ s y s t e m p e r f o r m a n c e w e h a v e c o n s i d e r e d t w o c l a s s e s of factors: the v a r i o u s c o n d i t i o n s of t e s t i n g , a n d the l e v e l and q u a l i t y of t r a i n i n g of the sys- tem user. In the p a p e r a s e q u e n c e of five d i f f e r e n t t e s t i n g s i t u a t i o n s is o b s e r v e d , e a c h o n e r e s u l t i n g in a l o w e r i n g of s y s t e m p e r f o r m a n c e by s e v e r a l p e r c e n t a g e p o i n t s b e l o w the p r e v i o u s one. A t r a i n i n g p r o - c e d u r e for p o t e n t i a l u s e r s is d e s c r i b e d . a n d an e x p e r i m e n t is d i s c u s s e d w h i c h u t i l - izes the t r a i n i n g p r o c e d u r e to e n a b l e u s e r s to s o l v e a c t u a l n o n - t r i v i a l p r o b l e m s u s i n g n a t u r a l l a n g u a g e v o i c e c o m m u n i c a - tion.
A c l a s s of n a t u r a l l a n g u a g e p r o c e s - sors is u n d e r d e v e l o p m e n t w h i c h a l l o w a u s e r t o d i s p l a y o b j e c t s of i n t e r e s t o n a c o m p u t e r t e r m i n a l a n d m a n i p u l a t e t h e m v i a t y p e d or s p o k e n E n g l i s h i m p e r a t i v e s e n - t e n c e s . S u c h a p r o c e s s o r is d e s i g n e d to r e s p o n d w i t h i n o n e to four s e c o n d s b y e x e - c u t i n g the i n p u t c o m m a n d and u p d a t i n g the d i s p l a y e d w o r l d for user v e r i f i c a t i o n . If an u n d e s i r e d a c t i o n is o b s e r v e d , a " b a c k u p " c o m m a n d m a k e s it p o s s i b l e to u n d o a n y a c t i o n and r e t u r n the s y s t e m to a p r e - v i o u s state. T h e d o m a i n s of i n t e r e s t i n c l u d e m a t r i x c o m p u t a t i o n , w h e r e o n e can d i s p l a y t a b l e s of d a t a a n d m a n i p u l a t e them: o f f i c e a u t o m a t i o n , w h e r e o n e c a n w o r k w i t h texts, files, c a l e n d a r s , or m e s - sages: a n d m a c h i n e c o n t r o l , w h e r e o n e m i g h t w i s h to c o m m a n d a r o b o t or o t h e r e q u i p m e n t via n a t u r a l l a n g u a g e input.
The f i r s t s u c h s y s t e m ( B i e r m a n n and B a l l a r d [6]), c a l l e d NLC, p r o v i d e s a m a t r i x c o m p u t a t i o n f a c i l i t y a n d a l l o w s u s e r s to d i s p l a y m a t r i c e s , e n t e r data, a n d m a n i p u l a t e the e n t r i e s , rows, a n d c o l u m n s . It b e c a m e o p e r a t i v e in Ig79 a n d i n c l u d e s a v a r i e t y of s p e c i a l p u r p o s e f e a t u r e s 1 ~ h i s W o r k was s u p p o r t e d by N a t i o n a l S c i e n c e F o u n d a t i o n G r a n t s M C S 7 9 0 4 1 2 0 a n d M C S 8 1 1 3 4 9 1 , b y the I B M C o r p o r a t i o n u n d e r G S D a g r e e m e n t no. 260880, a n d by the U n i v e r s i t e de P a r i s - S u d , L a b o r a t o i r e de R e c h e r c h e en I n f o r m a t i q u e d u r i n g the sum- mer of Ig82.
i n c l u d i n g a r b i t r a r i l y d e e p n e s t i n g of noun groups, e x t e n s i v e c o n j u n c t i o n p r o c e s s i n g , user d e f i n e d i m p e r a t i v e verbs, a n d l o o p i n g a n d b r a n c h i n g features. M o r e recently, a d o m a i n i n d e p e n d e n t a b s t r a c t i o n of the N L C s y s t e m has b e e n c o n s t r u c t e d and now is b e i n g s p e c i a l i z e d to h a n d l e a text pro- c e s s l n g task. In this system, text can be d i s p l a y e d and m o d i f i e d or f o r m a t t e d w i t h n a t u r a l language commands.
C u r r e n t work e m p h a s i z e s the a d d i t i o n of voice input, v o i c e output, a n d a t o u c h s e n s i t i v e d i s p l a y screen. S p e e c h r e c o g n i - tion is b e i n g d o n e on an e x p e r i m e n t a l b a s i s with the N i p p o n Electric D P - 2 0 0 C o n - n e c t e d S p e e c h R e c o g n i z e r in b o t h d i s c r e t e a n d c o n n e c t e d s p e e c h modes, and w i t h the V o t a n C o r p o r a t i o n V - S O 0 0 D e v e l o p m e n t Sys- tem. The touch s e n s i t i v e s c r e e n b e i n g used is a C a r r o l l touch panel m o u n t e d on a 19-inch color monitor. V o i c e r e s p o n s e is a l s o p r o v i d e d by the V o t a n V - 5 0 0 0 w h i c h a s s e m b l e s and v o c a l i z e s d i g i t a l l y r e c o r d e d h u m a n voice m e s s a g e s . The w o r k has pro- g r e s s e d to the point w h e r e OUr n a t u r a l l a n g u a g e m a t r i x c o m p u t e r N L C is o p e r a t i v e under voice control u s i n g the D P - 2 0 0 a n d the text p r o c e s s i n g s y s t e m is b e g i n n i n g to f u n c t i o n using the V - 5 0 0 0 s p e e c h r e c o g - nizer. The touch panel i n t e r f a c e and v o i c e r e s p o n s e s y s t e m s are still in the d e s i g n phase.
The goal of the project is to m a k e p o s s i b l e voice and touch i n t e r a c t i o n s o f the f o l l o w i n g kind:
R e t r i e v e file Budget83.
Find the largest number in this c o l u m n and zero it. (with touch input)
Add this c o l u m n p u t t i n g the result here. (with two touch inputs)
Send this file to J o n e s and file it as Budget83. (touch input)
~ a t is, i m p e r a t i v e s e n t e n c e s are to be p r o c e s s e d that o p e r a t e on d o m a i n o b j e c t s to produce m o d i f i c a t i o n s to the e x i s t i n g o b j e c t s or their r e l a t i o n s h i p to each other. The o b j e c t s are, for example, rows, columns, numbers, entries, labels, etc. in the m a t r i x d o m a i n or sections, paragraphs, sentences, margins, pages, etc. in the text p r o c e s s i n g domain. The e x e c u t i o n of each c o m m a n d is a c c o m p a n i e d by an update of the d i s p l a y e d data w i t h h i g h l i g h t i n g to indicate changes. Prompts and error m e s s a g e s will be given by v o i c e response, gystem d e s i g n is aimed at a l l o w i n g fast i n t e r a c t i v e control of the o b j e c t s on the screen w h i l e the user main- tains uninterrupted eye contact with th~
e v e n t s as they h a p p e n .
A c o n t i n u o u s p r o g r a m of h u m a n factors t e s t i n g has b e e n m a i n t a i n e d by the p r o j e c t in o r d e r to b u i l d a r e a l i s t i c view of p o t e n t i a l users and to m e a s u r e P r o g r e s s in a c h i e v i n g u s a b i l i t y . For example, in a test of the m a t r i x computation s y s t e m w i t h t y p e d input, t w e n t y - t h r e e s u b j e c t s s o l v e d p r o b l e m s s i m i l a r to those that might be a s s i g n e d in a first c o u r s e in p r o g r a m m i n g (Biermann, Ballard, a n d S i g m o n [7]). In this test, the N L C s y s t e m c o r r e c t l y pro- c e s s e d 81 p e r c e n t of the s e n t e n c e s and users w e r e q u i t e s a t i s f i e d w i t h its gen- eral p e r f o r m a n c e . O t h e r tests of the sys- tem are d e s c r i b e d in Fink [14] and G e i s t et el. [IS]. In a n o t h e r test (Fineman [13]), a s i m u l a t o r for a voice d r i v e n o f f i c e a u t o m a t i o n s y s t e m was used to o b t a i n data on user b e h a v i o r s w h e n p r o b l e m s o l v i n g is w i t h d i s c r e t e and slow con- n e t t e d speech. It was found that users q u i c k l y a d a p t e d t h e i r s p e e c h to the r e q u i r e d d i s c i p l i n e of slow, m e t h o d i c a l , a n d s i m p l e s e n t e n c e s w h i c h can be r e c o g - n i z e d b y m a c h i n e . Since the d a t a o b t a i n e d in a n y s y s t e m test is h e a v i l y d e p e n d e n t on the a m o u n t and kind of t r a i n i n g g i v e n to subjects, it is n e c e s s a r y to h a v e a s t a n d - a r d l z e d t r a i n i n g p r o c e d u r e . In the c u r r e n t work, a v o i c e t u t o r i a l has b e e n d e v e l o p e d for t r a i n i n g users to use a voice i n t e r a c t i v e s y s t e m (Deas [Ii]).
T h i s paper r e p o r t s on the c u r r e n t status of these p r o j e c t s w i t h e m p h a s i s on s y s t e m design, s p e e c h input f a c i l i t i e s and their p e r f o r m a n c e , the touch input s y s t e m and h u m a n factors c o n s i d e r a t i o n s .
S Y S T E M O V E R V Z E W
181
The b a s i c s y s t e m d e s i g n includes modules to do the f o l l o w i n g tasks:
(i) token a c q u i s i t i o n (2) p a r s i n g
(3) noun g r o u p r e s o l u t i o n (4) i m p e r a t i v e verb e x e c u t i o n (5) f l o w - o f - c o n t r o l semantics (6) s y s t e m output
verb. The f l o w - o f - c o n t r o l s e m a n t i c s m o d u l e m a n a g e s the e x e c u t i o n of m e t a - i m p e r a t i v e verbs such as ~ , a n d h a n - dles u s e r - d e f i n e d i m p e r a t i v e s . Finally, s y s t e m o u t p u t d i s p l a y s the s t a t e of the w o r l d on the screen. A n y m o d u l e m a y i s s u e p r o m p t s and e r r o r m e s s a g e s via text or s p o k e n o u t p u t . B a c k u p from a n y g i v e n m o d u l e to an e a r l i e r stage m a y o c c u r in u n u s u a l s i t u a t i o n s . M o r e d e t a i l s a p p e a r in B a l l a r d [i], B i e r m a n n [5], B i e r m a n n a n d B a l l a r d [6], a n d E a l l a r d and 8 i e r m a n n [3].
S P E E C H INPUT
A n a u t o m a t i c s p e e c h r e c o g n i z e r s u c h as the D P - 2 0 0 or V - 5 0 0 0 r e c o g n i z e s s p e e c h b y m e a n s of p a t t e r n m a t c h i n g a l g o r i t h m s .
A s u b j e c t is i n t r o d u c e d to the d e v i c e for a t r a i n i n g session, a n d a s k e d to r e p e a t the v a r i o u s words of the v o c a b u l a r y i n t o a m i c r o p h o n e . The d e v i c e e x t r a c t s a n d s t o r e s bit p a t t e r n s c o r r e s p o n d i n g to e a c h v o c a b u l a r y w o r d u t t e r e d by that p a r t i c u l a r s p e a k e r . A f t e r t r a i n i n g , w h e n a s p e a k e r w i s h e s to use the device, the a p p r o p r i a t e b i t p a t t e r n s a r e loaded. Each u t t e r a n c e of the s p e a k e r is c o m p a r e d w i t h the p r e - s t o r e d b i t p a t t e r n s a n d the b e s t m a t c h a b o v e a t h r e s h o l d l i m i t is p r e s e n t e d as the r e c o g n i z e d word. D e p e n d i n g o n the d e v i c e b e i n g used, the s p e a k e r m a y be r e q u i r e d to t a l k w i t h d i s c r e t e or con- n e c t e d speech. The r e s u l t s d e s c r i b e d b e l o w w e r e o b t a i n e d p r i m a r i l y in the d i s c r e t e mode w i t h a p a u s e of at least 200 m i l l i s e c o n d s a f t e r e a c h word.
Error H a n d l i n ~
T h e m a j o r d i f f i c u l t y f a c i n g users of a u t o m a t i c s p e e c h r e c o g n i t i o n e q u i p m e n t is the h i g h e r r o r rate. Even the b e s t d e v - ices in the b e s t of c i r c u m s t a n c e s are not e n t i r e l y free of error, a n d w h e n cir- c u m s t a n c e s are less than o p t i m a l , a n d m o r e like the real world, the e r r o r rate rises. Thus, a good part of the p r o j e c t e f f o r t has gone into c o p i n g w i t h e r r o r s in r e c o g - nition. In our view the s p e e c h r e c o g n i - tion d e v i c e is a c o m p o n e n t of the larger n a t u r a l l a n g u a g e c o m p u t i n g system, a n d our goal is to r e d u c e the s y s t e m error rate as m u c h as p o s s i b l e . W e h a v e t h e r e f o r e d e s i g n e d error c o r r e c t i o n s o f t w a r e that c o r r e c t s for c e r t a i n kinds of errors, a n d e r r o r m e s s a g e s that e l i c i t r e p e t i t i o n from the h u m a n s u b j e c t in less t r a c t a b l e cases. Error c o r r e c t i o n e s s e n t i a l l y func- tions by s t a r t i n g w i t h a s e q u e n c e of w o r d g u e s s e s from the input s y s t e m and f i l t e r - ing out the m e a n i n g l e s s a l t e r n a t i v e s at the a p p r o p r i a t e stages of p r o c e s s i n g . B e g i n n i n g in the t o k e n a c q u i s i t i o n phase, c e r t a i n u n a c c e p t a b l e w o r d s e q u e n c e s can be
d i s a l l o w e d . For example, a n o u n such as " m a t r i x " or "row" w o u l d be d i s a l l o w e d as the first w o r d in the s e n t e n c e s i n c e this is i l l e g a l in the s y s t e m g r a m m a r . In the p a r s i n g phase, a g r a m m a t i c a l s e q u e n c e of w o r d s is s e l e c t e d from the i n c o m i n g sets of w o r d g u e s s e s . Thus all u n g r a m m a t i c a l w o r d s e q u e n c e s a r e e l i m i n a t e d . T h e p a r s e r a l s o d i s a l l o w s p h r a s e s c o n t a i n i n g c e r t a i n s e m a n t i c a l l y u n a c c e p t a b l e r e l a t i o n s h i p s such as
the s e c o n d r o w in 6.
or p h r a s e s c o n t a i n i n g d i s a l l o w e d o p e r a - tions such as
A d d the m a t r i x to 6.
In the n o u n g r o u p p r o c e s s o r a n d Later stages, v a r i o u s o t h e r s e m a n t i c e r r o r s can be e l i m i n a t e d such as r e f e r e n c e s to n o n e x -
istent o b j e c t s or i m p o s s i b l e o p e r a t i o n s . For d i s c r e t e m o d e o p e r a t i o n s , e r r o r s a r e c l a s s i f i e d into four types:
a. S u b s t i t u t i o n s .
The d e v i c e r e p o r t s w o r d B w h e n w o r d A was a c t u a l l y spoken. b.
Re~ections.
T h e d e v i c e s e n d s a r e j e c t i o n c o d e w h e n a v o c a b u l a r y w o r d was spoken.
c. I n s e r t i o n s .
The d e v i c e r e p o r t s a v o c a b u l a r y w o r d w h e n a n o n - v o c a b u l a r y word, or noise, w a s uttered:
d. F u s i o n s . T w o (or more) w o r d s are s p o k e n but o n l y o n e w o r d is r e p o r t e d .
S u b s t i t u t i o n E r r o r s
S u b s t i t u t i o n e r r o r s are the e a s i e s t to c o r r e c t s i n c e the s u b s t i t u t e d w o r d o f t e n r e s e m b l e s the a c t u a l w o r d p h o n e t i - cally. Some of the s u b s t i t u t i o n s are f a i r l y p r e d i c t a b l e , e.g. "by" for "five", "and" for "add", or "up" for "of". We h a v e c o i n e d the term s y n o p h o n e to d e s c r i b e such sets. M a n y s y n o p h o n e p a i r s a r e s y m m e t r i c a l l y i n t e r c h a n g a b l e : h o w e v e r , some are not. For example, w i t h some s p e a k e r s , the w o r d "a" is fre- q u e n t l y r e p o r t e d as "eight" a l t h o u g h the c o n v e r s e s e l d o m occurs.
llst is c o m p i l e d . P a s s i n g the c o m p l e t e set of s y n o p h o n e s for each w o r d to the p a r s e r w o u l d result in e x c e s s i v e p a r s e time so it is d e s i r a b l e t o e l i m i n a t e b e f o r e h a n d a n y s y n o p h o n e s w h o s e o c c u r r e n c e can be d e t e r m i n e d to be i m p o s s i b l e b a s e d on g r a m m a t i c a l or c o n t e x t u a l c o n s i d e r a - tions. For e x a m p l e the s y n t a x of E n g l i s h (and of NLC) p r e v e n t s c e r t a i ~ words from o c c u r r i n g next to each other, or b e g i n n i n g or e n d i n g s e n t e n c e s . T h i s i n f o r m a t i o n is r e c o r d e d i n a t a b l e of a d J a c e n c i e s . If there is a s y n o p h o n e in a w o r d slot that cannot be p r e c e d e d by a n y of the s y n o - p h o n e s in the p r e v i o u s w o r d slot that s y n o p h o n e is deleted. This p r o c e s s is r e p e a t e d until no m o r e d e l e t l o n s are pos- sible. On average, r o u g h l y o n e - h a l f of the c a n d i d a t e s y n o p h o n e s a r e d e l e t e d . S i n c e p a r s i n g time m a y i n c r e a s e e x p o n e n - t l a l l y w i t h the n u m b e r of c a n d i d a t e syno- phones, and this table d r i v e n e l i m i n a t i o n p r o c e s s is very quick, c o n s i d e r a b l e sav- ings result.
F o r r e a s o n s of i n d i v l d u a l s p e e c h v a r i a t i o n some v o c a b u l a r y w o r d s will h a v e s y n o p h o n e s p e c u l i a r to an i n d i v i d u a l speaker. The set of s y n o p h o n e s of each v o c a b u l a r y w o r d is t h e r e f o r e a u g m e n t e d to a c c o m m o d a t e this s i t u a t i o n so that each s p e a k e r has p e r s o n a l i z e d s y n o p h o n e sets. Early t r a i n i n g i n c l u d e s a t u t o r i a l intro- ductlon, part of w h i c h r e q u i r e s the sub- Ject to repeat s e n t e n c e s word for word. In this mode, the s o f t w a r e has a priori k n o w l e d g e of the c o r r e c t token--for each w o r d slot. If a g i v e n w o r d slot does not c o n t a i n the correct token, the s u b s t i t u t e d word can be a d d e d to the a p p r o p r i a t e syno- p h o n e set for that subject. T h e r e a f t e r ,
if the same s u b s t i t u t i o n error recurs dur- ing a s e s s i o n w i t h that subject, the correct word will be included in the syno- p h o n e list for that word slot.
R e ~ e c t i o n Errors
The o c c u r r e n c e of one or m o r e r e j e c - tions in a s e n t e n c e almost always results in a request for r e p e t i t i o n . However, we are d e s i g n i n g a n u m b e r of f a c i l i t i e s to h a n d l e r e j e c t i o n s . I n some cases, the r e j e c t e d word can be d e t e r m i n e d from con- text, and p r o c e s s i n g can c o n t i n u e u n i n t e r - rupted. Otherwise, the current plan is to h a n d l e a single r e j e c t i o n by r e t u r n i n g an a u d i o r e s p o n s e that repeats all of the s e n t e n c e w i t h the w o r d "what" in place of the r e j e c t e d element. The s p e a k e r will then .be able to c h o o s e to repeat the r e j e c t e d word or, in case other errors are apparent, to repeat the e n t i r e utterance.
I n cases of m u l t i p l e r e j e c t i o n errors, the speaker is r e q u e s t e d to repeat the entire utterance. In all cases previ- ous u t t e r a n c e s will not be. d i s c a r d e d . The scanner will merge them, c o m p l e t e w i t h
s y n o p h o n e s , in an a t t e m p t to e l i m i n a t e r e J e c t i o n e and p r o v i d e the b r o a d e s t a m o u n t of i n f o r m a t i o n from w h i c h to e x t r a c t what the s p e a k e r a c t u a l l y said. For example, if the a c t u a l u t t e r a n c e w e r e
A B C D E F G
a n d the r e c o g n i z e r r e t u r n e d A m * Z E * G
w h e r e * s t a n d s for r e j e c t i o n , the s p e a k e r will be a s k e d to repeat. If
A B C * E F H
is then r e c o g n i z e d , it will be c o m b i n e d w i t h the first u t t e r a n c e so that the s c a n n e r c o n s i d e r s the seven w o r d slots to contain=
s(A) sis) sic) s(z) sis) s(F) s(G)
sin)
w h e r e siX) is the u n i o n of X w i t h its s y n o p h o n e s . ( H o p e f u l l y D is in s(Z).) If s u b s e q u e n t u t t e r a n c e s are so d i f f e r e n t from p r e v i o u s ones that t h e y are u n l i k e l y to be w o r d - f o r - w o r d r e p e t i t i o n s (for e x a m - ple, b y c o n t a i n i n g a d i f f e r e n t n u m b e r of words), p r e v i o u s u t t e r a n c e s will be d i s - c a r d e d a n d p r o c e s s i n g will be s t a r t e d o v e r .
It m a y a l s o be p o s s i b l e to predict a r e j e c t e d w o r d w i t h some d e g r e e of c e r - t a i n t y b a s e d o n s e m a n t i c or p r a g m a t i c i n f o r m a t i o n . (We c o n s i d e r p r a g m a t i c s to i n v o l v e d i s c o u r s e d e p e n d e n t c o n t e x t u a l factors.) For e x a m p l e s u p p o s e the s c a n n e r r e c e i v e s from the r e c o g n i z e r :
D o u b l e * nine and add c o l u m n four to it. The most l i k e l y p o s s i b i l i t i e s for the r e j e c t i o n are entry, row and column. Entry can be e l l m z ~ ' ~ a t e ~ - - o n s e m a n t i c g r o u n d s since it is m e a n i n g l e s s to a<]d a c o l u m n to an entry. Row is s e m a n t i c a l l y possible, but pragma-'~cally less l i k e l y than c o l u m n s i n c e a d d i n g c o l u m n s to c o l u m n s is much m o r e c o m m o n than a d d i n g c o l u m n s to rows. Thus c o l u m n m a y be chosen. F u r t h e r m o r e if t-h-e m a t r i x in focus is six by seven, then the nine is a s u b s t i t u t i o n error, and the s e n t e n c e will be r e j e c t e d on p r a g m a t i c grounds ini-
tially. However, since five is a syno-
p h o n e of nine the s e n t e n c e ~ be t r i e d w i t h flve 'in the place of nine. U l t i - m a t e ! y t't~'~e~e user will see d i s p l a y S , on the s c r e e n the result From:
[image:4.612.96.320.102.315.2]T h e a c t i v i t y d e s c r i b e d a b o v e is t r a n - s p a r e n t to the user. If the r e s u l t s a r e u n s a t i s f a c t o r y to the user, the c o m m a n d " b a c k u p " w i l l u n d o them.
An a d d i t i o n a l s o u r c e of p r a g m a t i c e r r o r c o r r e c t i o n c o m e s f r o m u t t e r a n c e s in h i s t o r i c a l l y s i m i l a r d i a l o g s . W e a r e d e v e l o p i n g a m e t h o d for u t i l i z i n g this type of i n f o r m a t i o n . C o n s i d e r i n g the last e x a m p l e , if the u s e r h a d b e e n a d d i n g c o l u m n s to rows q u i t e freque~: "~-" in the c u r r e n t a n d / o r r e c e n t s e s s i o n s , b u t r a r e l y if ever a d d i n g c o l u m n s to c o l u m n s , the s y s t e m w o u l d c h o o s e row as the r e j e c t e d w o r d .
I n s e r t i o n E r r o r s a n d F u s i o n E r r o r s M o s t s p e e c h r e c o g n i z e r s a l l o w the t h r e s h o l d v a l u e to be a d j u s t e d that d e t e r - m i n e s w h e t h e r the b e s t m a t c h is " r e c o g - n i z e d " or is r e j e c t e d . S i n c e r e j e c t i o n s a r e h a r d e r to c o r r e c t for t h a n s u b s t i t u - t i o n s t h e r e is r e a s o n to l o w e r t h i s value. T o o low a value, h o w e v e r , a g g r a v a t e s the i n s e r t i o n p r o b l e m . W h e n the s p e a k e r u t t e r s a n o n - v o c a b u l a r y word, o r e m i t s a g r u n t or u n c o u t h sound, the c o r r e c t r e s p o n s e is a r e j e c t i o n . A n o n - r e j e c t i o n in t h i s s i t u a t i o n m a y be d i f f i c u l t to deal w i t h .
In o u r e x p e r i e n c e u s e r s h a v e l i t t l e t r o u b l e in c o n f i n i n g t h e m s e l v e s to the t r a i n e d v o c a b u l a r y . M o s t i n s e r t i o n e r r o r s o c c u r b e t w e e n s e n t e n c e s , r a t h e r t h a n b e t w e e n w o r d s w i t h i n a s e n t e n c e . T h i s r e s u l t s in e x t r a n e o u s " w o r d s " in the first o n e or t w o w o r d slots. T h e s e can o f t e n be e l i m i n a t e d b e c a u s e n e i t h e r t h e y nor t h e i r s y n o p h o n e s can b e g i n a s e n t e n c e in the N L C g r a m m a r . T i m i n g c o n s i d e r a t i o n s , too, c o u l d be u s e d to e l i m i n a t e , or at l e a s t c a s t s u s p i c i o n on, i n t e r - s e n t e n c e i n s e r - tions, t h o u g h w e h a v e not f o u n d the n e e d for s u c h m e a s u r e s .
R a w E r r o r Rate
A l t h o u g h a g o o d deal of our i n t e r e s t i s in c o r r e c t i n g or c o m p e n s a t i n g for the v a r i o u s k i n d s of e r r o r s in r e c o g n i t i o n , w e a r e a l s o w o r k i n g o n w a y s to r e d u c e the a c t u a l n u m b e r of e r r o r s m a d e b y the r e c o g - n i t i o n d e v i c e s (the r a w e r r o r r a t e ) . C a r e f u l v o c a b u l a r y c h o i c e a n d p r o p e r t u n - ing of the h a r d w a r e s u c h as t h r e s h o l d l e v e l s e l e c t i o n s a r e c r u c i a l f a c t o r s .
It is i m p o r t a n t t o c h o o s e v o c a b u l a r y w o r d s as w i d e l y s e p a r a t e d p h o n e t i c ~ l l y as c i r c u m s t a n c e s a l l o w . A d d i t i o n a l l y , w e h a v e f o u n d that w o r d s c o n t a i n i n g n o n - s t r i d e n t f r i c a t i v e s (e.g. the th in fifth), a f f r i c a t e s (e.g. t h e c---h in c
u-'~-~r'ch), l i q u i d s (r a n d I) a n d nasals-'(m,n a n ~ q ) a r e m o r t d i f f T c u l t to r e c o g n i z e t h a n w o r d s c o n t a i n i n g o t h e r s o u n d s . M o n o s y l l a b i c w o r d s , in g e n e r a l , a r e not r e c o g n i z e d as r e a d i l y as p o l y s y l l a b i c ones, t h o u g h w o r d s t h a t a r e l o n g a n d d i f - f i c u l t to p r o n o u n c e (e.g. a n a e s t h e t i s t ) a r e a l s o t o be a v o i d e d . O f t e n the d o m a i n l e a v e s l i t t l e l a t i t u d e for v o c a b u l a r y c h o i c e . If o r d i n a l n u m b e r s a r e n e e d e d it is n e c e s s a r y t o h a v e f i f t h a n d sixth, w h i c h a r e d i f f i c u l t t o - - ~ i n g u i s h . But i n s t e a d of a w o r d like r a t e w h i c h is e a s i l y c o n f u s e d w i t h eig t~-~-, tax r a t e or r a t e - o f - p a y ( p r o n o u n c e d as a s i n g l e - - ' ~ r d ) m % - - ~ t ~ a b e t t e r c h o i c e .
C o r r e c t t r a i n i n g p r o c e d u r e s a r e i n s t r u m e n t a l in r e d u c i n g the r a w e r r o r r a t e as a r e s u c h f a c t o r s as w h e t h e r t h e user r e c e i v e s i m m e d i a t e f e e d b a c k f r o m the r e c o g n i z e r , the form a n d f r e q u e n c y of e r r o r m e s s a g e s r e q u e s t i n g r e p e t i t i o n , a n d the d e g r e e of c o m f o r t fett b y the u s e r i n s o f a r as a t t i t u d e t o w a r d c o m p u t e r s is c o n c e r n e d . Some of t h e s e a r e d i s c u s s e d b e l o w in the s e c t i o n M e a s u r i n @ S y s t e m P e r -
f o r m a n c e .
W e h a v e o b s e r v e d f u s i o n e r r o r s in d i s c r e t e mode. T h e y a r i s e w h e n the s p e a k e r n e g l e c t s to p a u s e l o n g e n o u g h b e t w e e n w o r d s . In our e x p e r i e n c e t h e y o c c u r so i n f r e q u e n t l y we h a v e not t r i e d to c o m p e n s a t e for them. T h i s type of e r r o r is m o r e c r u c i a l w h e n o p e r a t i n g in c o n - n e c t e d mode. It m a y be the c a s e that t w o (or p o s s i b l y m o r e ) w o r d s are r e p o r t e d as a s i n g l e w o r d d i f f e r e n t from e i t h e r of the t w o o r i g i n a l l y u t t e r e d w o r d s . It m a y a l s o h a p p e n that t w o w o r d s , A a n d B, a r e r e p o r t e d as e i t h e r A or 8. In t h i s c a s e the f u s i o n e r r o r t a k e s on the a p p e a r a n c e of an o m i s s i o n . Our c o n n e c t e d s p e e c h parser, c u r r e n t l y u n d e r c o n s t r u c t i o n , w i l l h a v e the a b i l i t y z9 g u e s s an o m i s s i o n a n d i n s e r t a c o r r e c t i o n if s u f f i c i e n t c o n t e x -
tual i n f o r m a t i o n is a v a i l a b l e .
Some M i s c e l l a n e o u s Q u e s t i o n s
A p a r t from e r r o r c o r r e c t i o n , a n u m b e r of o t h e r q u e s t i o n s h a v e a r i s e n d u r i n g our i m p l e m e n t a t i o n of the v o i c e d r i v e n s y s t e m . A m o n g t h e s e are:
a) H o w is the b e g i n n i n g of a s e n - t e n c e d e t e c t e d ?
b) H o w is the e n d of a s e n t e n c e d e t e c t e d ?
c) H o w can a user m a k e a c o r r e c t i o n in m i d - s e n t e n c e ?
C u r r e n t l y a s e n t e n c e b e g i n s w i t h a n y input a f t e r the end of the p r e v i o u s s e n - tence. T h e i n s t a n c e s of i n t e r - or p r e - s e n t e n c e i n s e r t i o n s w e r e d i s c u s s e d a b o v e .
S e n t e n c e s a r e t e r m i n a t e d by the m e t e - w o r d o v e r . T h i s w o r d has few s y n o -
p h o n e s in the c u r r e n t w o r d set a n d h a s the a d v a n t a g e of b e i n g w i d e l y u n d e r s t o o d t o m e a n "end of t r a n s m i s s i o n . " H o w e v e r , w e p l a n to e x p e r i m e n t w i t h o t h e r k i n d s of t e r m i n a t i o n s u c h as use of t o u c h input or t i m i n g i n f o r m a t i o n .
A user m a y m i s s p e a k in i n s t r u c t i n g t h e c o m p u t e r t o p e r f o r m a t a s k a n d m a y w i s h to r e p e a t all or part of the c o m m a n d . Also, if the w o r d s from the w o i c e r e c o g - n i z e r a r e d i s p l a y e d as t h e y a r e spoken, the user m a y d e s i r e to c o r r e c t a m i s r e c o g -
n i t i o n . '~ne m e t a w o r d c o r r e c t i o n i s
c u r r e n t l y u s e d t o i m p l e m e n t this f a c i l i t y . T h e r e a r e s e v e r a l l e v e l s of c o r r e c t i o n . S o m e m a y be a c c o m p l i s h e d b y the s c a n n e r , w h i l e o t h e r s r e q u i r e m o r e i n f o r m a t i o n than is a v a i l a b l e to the s c a n n e r a n d m u s t t h e r e f o r e be h a n d l e d b y the p a r s e r . T h e s i m p l e s t type of c o r r e c t i o n c o n s i s t s of c h a n g i n g o n e w o r d at the e n d of the sen- tence:
A d d row o n e to row four c o r r e c t i o n three.
H e r e the s c a n n e r m e r e l y d e l e t e s the w o r d slot b e f o r e the m e t a w o r d . If s e v e r a l w o r d s f o l l o w " c o r r e c t i o n " as in
Add row o n e to row t w o c o r r e c t i o n r o w o n e to c o l u m n three.
the s c a n n e r d e t e c t s this fact a n d s c a n s b a c k w a r d in the s e n t e n c e , a t t e m p t i n g tO m a t c h the l a r g e s t p o s s i b l e n u m b e r of w o r d slots b e f o r e and i m m e d i a t e l y a f t e r the m e t a w o r d . In this e x a m p l e the t o k e n s for row, one and to match, s o the s c a n n e r copies--t'~e last ~ r t of the s e n t e n c e into the e a r l i e r part of the b u f f e r to a r r i v e at
Add row one to c o l u m n three.
In the c a s e of an u t t e r a n c e such as
Add row one to row two c o r r e c t i o n c o l u m n three.
it is i m p o s s i b l e to m a t c h the tokens b e f o r e and a f t e r the m e t a w o r d . The s c a n n e r t h e r e f o r e d e l e t e s the t o k e n [~Ime,]iately b e f o r e the m e t a w o r d , flags the w o r d slot p r e c e d i n g that t o k e n and p a s s e s the result to the parser. In the example,
Add row one to row c o l u m n three. is passed, w i t h the word slot c o n t a i n i n g ro w flagged. The p a r s e r a t t e m p t s to make
185
s e n s e of the set of t o k e n s p a s s e d . If it cannot, the f l a g g e d w o r d s l o t is d e l e t e d , t h e w o r d p r e v i o u s to it is f l a g g e d a n d a n o t h e r p a r s e is a t t e m p t e d . T h e p r o c e s s is r e p e a t e d u n t i l a s u c c e s s f u l P a r s e is found. If n o n e is found, an e r r o r m e s s a g e is issued. T h u s in the e x a m p l e , a f t e r f a i l i n g tO P a r s e the t o k e n s as passed, the p a r s e r t r i e s
A d d r o w o n e to c o l u m n three. w h i c h is p a r s e d s u c c e s s f u l l y .
T O U C H I N P U T
An i m p o r t a n t a s p e c t of n a t u r a l l a n g u a g e c o m m u n i c a t i o n is p o i n t i n g , w h i c h is o f t e n u s e d in c o n n e c t i o n w i t h w o r d s s u c h as this, that, h e r e a n d there. P o i n t i n g may---'f'~ncto~-o'n--as em-'m'~asis, as in
Put the d o g out.
w h e r e e i t h e r the dog, t h e o u t s i d e , or p o s - s i b l y b o t h a r e p o i n t e d to. P o i n t i n g a l s o f u n c t i o n s to put o b j e c t s into focus, a l l o w i n g s u b s e q u e n t r e f e r e n c e s to use a d e f i n i t e p r o n o u n : for e x a m p l e ,
Move that t h e r e and c o v e r it.
w i t h a p o i n t to the o b j e c t to be m o v e d a n d c o v e r e d .
A p o i n t i n g a b i l i t y w o u l d fit in v e r y n i c e l y w i t h v o i c e d r i v e n N L C a n d our p r o - Ject i n c l u d e s a t o u c h s e n s i t i v e s c r e e n so that the user can s a y " d o u b l e this", p o i n t to a row, a n d c a u s e the p r o c e s s o r to d o u - ble e v e r y e l e m e n t in that row. M o r e com- p l e x s e n t e n c e s such as
A d d this row to that row p u t t i n g the r e s u l t s here. (with t h r e e t o u c h e e )
a l s o b e c o m e p o s s i b l e .
A p a r t from b e i n g " n a t u r a l " in the s e n s e that o r d i n a r y l a n g u a g e users point often, p o i n t i n g m a y i n c r e a s e the e f f i - c i e n c y of c o m m u n i c a t i o n .
p r o b l e m s i n c l u d i n g G a u s s i a n e l i m i n a t i o n , d i v i d e d d i f f e r e n c e s a n d m a t r i x i n v e r s i o n , u s i n g N L C w i t h o u t touch. W e t h e n w e n t b a c k a n d r e w r o t e the s o l u t i o n s u s i n g the t o u c h f a c i l i t y , but w i t h o u t a n y o t h e r c h a n g e s . On the a v e r a g e 29% f e w e r w o r d s w e r e n e e d e d to s o l v e the p r o b l e m , a n d i n d i v i d u a l s e n t e n c e s w e r e s h o r t e n e d b y 23%.
A n u m b e r of i n t e r e s t i n g p r o b l e m s a r i s e w h e n a t o u c h f a c i l i t y is i m p l e - m e n t e d . O n e is h o w to p a i r u p t a c t i l e a n d v e r b a l input in the w a y i n t e n d e d b y the u s e r . A n o t h e r p r o b l e m is i d e n t i f y i n g the a c t u a l o b j e c t the user i n t e n d s to r e f e r to o n c e the t a c t i l e a n d v e r b a l input h a v e b e e n r e s o l v e d .
An e x a m p l e of the l a t t e r p r o b l e m w o u l d be the c o m m a n d
D o u b l e t h i s
a c c o m p a n i e d b y a t o u c h of e l e m e n t <3,2> of a d i s p l a y e d m a t r i x . D o e s the user w a n t to d o u b l e e l e m e n t (3,2>, d o u b l e r o w 3, d o u b l e c o l u m n 2, or e v e n d o u b l e the e n t i r e m a t r i x ? T h e s a m e t o u c h p a i r e d w i t h
D o u b l e t h i s entry. D o u b l e this m a t r i x . D o u b l e t h i s c o l u m n .
or
D o u b l e this m a t r i x .
w o u l d be u n a m b i g u o u s . If the d e m o n s t r a - tive is not a c c o m p a n i e d by a n o m i n a l some s t r a t e g y is n e e d e d to p r o c e s s the s e n - tence. W e opt for the s m a l l e s t p o s s i b l e n o u n g r o u p e n c o m p a s s e d by the t o u c h (the <3,2> e n t r y in the a b o v e case), a n d r e l y o n our " b a c k u p " f a c i l i t y in c a s e the u s e r ' s i n t e n t i o n s are not f u l f i l l e d . If the u t t e r a n c e " d o u b l e this" is a c c o m p a n i e d b y a t o u c h of the d i s p l a y e d n a m e of a row, c o l u m n or m a t r i x , t h e n the n a m e d o b j e c t will be r e f e r e n c e d .
P a i r i n g u p t o u c h e s w i t h s p o k e n p h r a s e s is s t r a i g h t f o r w a r d w h e n a s i n g l e n o u n g r o u p is u s e d w i t h a s i n g l e touch, as in " d o u b l e this e n t r y . " In a m o r e c o m p l i - c a t e d c a s e we m i g h t h a v e
Add this e n t r y to that row and put the r e s u l t h e r e .
a c c o m p a n i e d b y t h r e e t o u c h e s . The s t r a - t e g y h e r e is to - a i r t o u c h e s and u t t e r - a n c e s in the o r d e r g i v e n by the user.
In the last e x a m p l e all t o u c h e s f u n c -
t i o n e d t o e s t a b l i s h focus or resol~=e no,~n g r o u p r e f e r e n c e . If the e m p h a s i s f u n c t i o n of t o u c h is m i x e d in, a m o r e d i f f i c u l t s i t u a t i o n a r i s e s . If t h r e e t o u c h e s a c c o m - p a n y
A d d t h i s e n t r y t o t h e f i r s t row a n d put the r e s u l t h e r e .
t h e n the s e c o n d t o u c h w a s p r e s u m a b l y t o e m p h a s i z e the f i r s t row or e v e n to e s t a b - l i s h a r h y t h m of t o u c h i n g . In a n y c a s e the f a c i l i t y to m a t c h t o u c h e s w i t h n o n - d e i c t i c e x p r e s s i o n s iS n e e d e d . If o n l y t w o t o u c h e s a c c o m p a n y t h i s last s e n t e n c e t h e n the f o c u s i n g f u n c t i o n s h o u l d t a k e p r e c e d e n c e , a n d the t o u c h e s s h o u l d b e m a t c h e d w i t h " t h i s e n t r y " a n d " h e r e . "
T h e s i t u a t i o n is m a d e e v e n m o r e c o m - p l e x b y the a b i l i t y t o e s t a b l i s h f o c u s v e r b a l l y . In N L C the u s e r can s a y
C o n s i d e r r o w four. D o u b l e that row.
a n d the e x p r e s s i o n " t h a t row" w i l l r e f e r to r o w four. ~f the s a m e u t t e r a n c e is a c c o m p a n i e d b y a t o u c h to a row o t h e r t h a n four a p o t e n t i a l c o n f l i c t r e s u l t s . Our s t r a t e g y is to g i v e p r e c e d e n c e to touch, s i n c e it is the m o r e i m m e d i a t e f o c u s s i n g m e c h a n i s m . T h u s the s e q u e n c e
C o n s i d e r r o w four.
D o u b l e that row. ( t o u c h i n g row t h r e e ) w i l l r e s u l t in the d o u b l i n g of row three.
W h e n b o t h v e r b a l a n d t o u c h focus a r e p r e s e n t , n e a r l y u n r e s o l v a b l e a m b i g u i t i e s m a y r e s u l t . T h e s e q u e n c e
C o n s i d e r r o w four. Add this row to that row.
a c c o m p a n i e d by o n e touch, g i v e s r i s e to the p r o b l e m as to w h i c h d e m o n s t r a t i v e n o u n g r o u p to a s s o c i a t e w i t h row four, a n d w h i c h t o a s s o c i a t e w i t h the touch. O n e s t r a t e g y is to a s s o c i a t e w i t h a d e m o n s t r a - tive n o u n g r o u p the t o u c h that o c c u r r e d c l o s e s t to the t i m e of u t t e r a n c e . A n o t h e r p o s s i b l e s t r a t e g y is to a s s u m e that the e x p r e s s i o n w i t h that r e f e r s to the m o r e d i s t a n t e l e m e n t in f o c u s (the o n e e s t a - b l i s h e d v e r b a l l y in this case). T h i s takes a d v a n t a g e of the ~act that this a n d that can be d i s t i n g u i s h e d in E n g l i s h g r a m - mar b y the f e a t u r e + N E A R . U n f o r t u n a t e l y b y a s i m p l e c h a n g e i F s t r e s s p a t t e r n a s p e a k e r c a n u n d o t h i s f a i r l y w e a k r e g u l a r - ity. Thus the s e q u e n c e
C o n s i d e r row four. ~dd th{s row to that row.
plus a s i n g l e touch, w h e r e this b e a r s prl- m a r y stress and that b e a r s s e c o n d a r y stress, s h o u l d flnd t-- e ~ t o u c h r e f e r r i n g to "this row." If the stress p a t t e r n w e r e
Add t h i s row t o t h a t row.
w i t h p r i m a r y s t r e s s on Add, the
touch
w o u l d m o r e l l k e l y be assoc--~-ated w i t h that row. It is u n f o r t u n a t e that to d a t e we~ w o f no v o i c e e q u i p m e n t s e n s i t i v e e n o u g h t o d i s t i n g u i s h b e t w e e n two s u c h s t r e s s p a t t e r n s .
S o m e w h a t m o r e c o m p l i c a t e d cases a r e p o s s i b l e =
C o n s i d e r row three.
~ld this row to that row and p u t the r e s u l t in the first row.
a c c o m p a n i e d by two touches. Since we a l l o w a touch to o c c u r w i t h e x p r e s s i o n s such as "the first row," and since it is
p o s s i b l e to d i s r e g a r d the e l e m e n t i n ver- bal focus a l t o g e t h e r , s u c h a case p r o d u c e s m u l t i p l e a m b i g u i t i e s . A l t h o u g h we foresee b e i n g a b l e to r e s o l v e t h e s e a m b i g u i t i e s e f f e c t i v e l y , and ca~ a l w a y s fall b a c k on o u r "backup" f a c i l i t y in c a s e of mistakes, w e a l s o b e l i e v e that such c o m p l e x cases will be e x t r e m e l y rare. No s e n t e n c e of such c o m p l e x i t y was p r o d u c e d in our solu- tions t o t h e e i g h t p r o b l e m s m e n t i o n e d
above. W i t h a voice and t o u c h facility, s e n t e n c e s tend to be s h o r t e r and simpler.
N L C has i m p l e m e n t e d plurals, b u t we h a v e not c o n s i d e r e d their use in touch input. Such s e n t e n c e s as
or
M u l t i p l y these e l e m e n t s by this element.
Add these e l e m e n t s up.
with m u l t i p l e touches, w o u l d be useful. In the trial run of eight problems, the i n t r o d u c t i o n of p l u r a l i t y r e s u l t e d in up to fifty percent r e d u c t i o n in number of words needed and s e n t e n c e length.
M E A S U R I N G S Y S T E M P E R F O R M A N C E
P r o g r e s s in any e n d e a v o r is g r e a t l y a i d e d if the level of a c c o m p l i s h m e n t can be m e a s u r e d in some m e a n i n g f u l way. It is d e s i r a b l e to give a figure of merit for a system both so that a project can indicate to the world the d e g r e e of the a c h i e v e m e n t
a n d a l s o so that the p r o j e c t can i n t e r - n a l l y Judge its o w n i m p r o v e m e n t s o v e r time. In v o i c e l a n g u a g e p r o c e s s i n g , o n e can a t t e m p t to m e a s u r e p e r f o r m a n c e by the w o r d and s e n t e n c e e r r o r rates. However,
e x p e r i e n c e shows that t h e s e m e a s u r e s a r e
highly
d e p e n d e n t on t w o f a c t o r s and that a l m o s t a n y level of p e r f o r m a n c e can be r e a c h e d if t h o s e factors a r e a p p r o p r i a t e l y a d j u s t e d . T h o s e factors are(a) the e n v i r o n m e n t and type of test w i t h i n w h i c h the m e a s u r e m e n t is made, a n d
(b} the level of t r a i n i n g of t h e s y s t e m user.
T y p e O~f T e s t l n ~ E n v i r o n m e n t
C o n s i d e r i n g (a), w e tend to c l a s s i f y the type of test for a r e c o g n i z e r into o n e of the f o l l o w i n g five c a t e g o r i e s and we e x p e c t s i g n i f i c a n t d i f f e r e n c e s in d e v i c e r e s p o n s e in each case.
187
(1) Lists of words are read in tests p e r f o r m e d by the m a n u f a c t u r e r .
(2) Lists of words are read in our
l a b o r a t o r y .
(3) S e n t e n c e s are read in our l a b o r a - tory. ( d i s c r e t e or c o n n e c t e d ) (4) S e n t e n c e s are u t t e r e d in a prob-
lem s o l v i n g s i t u a t i o n in our l a b o r a t o r y . ( d i s c r e t e or con- nected)
(5) S e n t e n c e s are u t t e r e d in a prob- lem s o l v i n g s i t u a t i o n in the u s e r e n v i r o n m e n t . ( d i s c r e t e or con- nected)
In the first situation, a m a n u f a c - turer is i n t e r e s t e d in a d v e r t i s i n g the b e s t p e r f o r m a n c e a c h i e v a b l e . Tests are p e r f o r m e d in c o n t r o l l e d c o n d i t i o n s with m i c r o p h o n e p l a c e m e n t and all s y s t e m p a r a m - eters set for o p t i m u m p e r f o r m a n c e , a n d an expert s p e a k e r is used. In our labora- tory, we are not i n t e r e s t e d in the best p o s s i b l e s y s t e m p e r f o r m a n c e but rather what we can r e a l i s t i c a l l y expect. The p a r a m e t e r s are set at m e d i u m levels, t h e r e is some a m b i e n t noise, the m i c r o p h o n e ~ay m o v e d u r i n g the test, and the user wil] be a n y o n e we h a p p e n to b r i n g in r e g a r d l e s s of their s p e e c h c h a r a c t e r i s t i c s .
as the s e q u e n t i a l p a r t s of each s e n t e n c e a r e voiced. T r a i n i n g s a m p l e s b a s e d o n r e a d i n g lists of v o c a b u l a r y items tend to be i n a c c u r a t e t e m p l a t e s for w o r d s s p o k e n in c o n t e x t . W h e n s e n t e n c e s are s p o k e n in a p r o b l e m s o l v i n g e n v i r o n m e n t , s i t u a t i o n (4), t h e s e e f f e c t s i n c r e a s e a n d o t h e r a s p e c t s of w o r d p r o n u n c i a t i o n c h a n g e . W h e n v o i c e c o n t r o l stops b e i n g the c e n t r a l c o n c e r n of the speaker, l a r g e T v a r i a t i o n s in s p e e c h a r e b o u n d to o c c u r w i t h a c c o m - p a n y i n g l a r g e r error rates.
The m o s t d i f f i c u l t s i t u a t i o n of all o c c u r s in s i t u a t i o n (5) w h e r e the user m i g h t not e v e n be a p e r s o n w h o c o u l d be b r o u g h t into a voice l a b o r a t o r y . In this case, the user h a s o n l y o n e concern, a c h i e v i n g the d e s i r e d m a c h i n e p e r f o r m a n c e . E n c o u r a g e m e n t to s p e a k c a r e f u l l y c o u l d be m e t w i t h i m p a t i e n c e , and a few s y s t e m e r r o r s c o u l d r e s u l t in e v e n w o r s e s p e e c h q u a l i t y a n d f u r t h e r d e g r a d e d p e r f o r m a n c e .
Our e x p e r i e n c e has b e e n that w o r d e r r o r rates i n c r e a s e from about t h r e e to s e v e n p e r c e n t as o n e m o v e s to e a c h m o r e d i f f i c u l t s i t u a t i o n type d e p e n d i n g o n the v o c a b u l a r y , t~e e q u i p m e n t , a n d o t h e r fac- tors. C o n s e q u e n t l y , w e t e n d to d i s t r u s t a n y f i g u r e s g a t h e r e d in the e a s i e r c l a s s e s of e n v i r o n m e n t s a n d a t t e m p t to do our o w n t e s t i n g in the m o r e d i f f i c u l t a n d m o r e i n t e r e s t i n g s i t u a t i o n s . M o s t of our r e c e n t d a t a is of type (4) and we h o p e to g a i n s o m e type (5) e x p e r i e n c e in the com-
ing year.
T r a i n i n g the S y s t e m U s e r
The s e c o n d m a j o r factor a f f e c t i n g v o i c e r e c o g n i t i o n p e r f o r m a n c e is the level of t r a i n i n g of the s y s t e m user. H u m a n s a r e e x t r e m e l y a d a p t i v e and c a p a b l e of l e a r n i n g b e h a v i o r s to a h i g h d e g r e e of p e r f e c t i o n . Thus the d e s i g n e r of a v o i c e s y s t e m might, o v e r the years, l e a r n to c h a t w i t h it like an o l d f r i e n d w h e r e a s o t h e r s m i g h t not be a b l e to use the s y s t e m at all. ~gain, a l m o s t a n y level of s y s t e m p e r f o r m a n c e can be o b s e r v e d d e p e n d i n g on the q u a l i t y of t r a i n i n g of the user.
Our a p p r o a c h to c o n t r o l l i n g this fac- tor has b e e n to d e v e l o p a s t a n d a r d i z e d t r a i n i n g p r o c e d u r e and to o n l y r e p o r t s t a t i s t i c s on u n i n i t i a t e d users w h o s e e x p e r i e n c e w i t h the s y s t e m is l i m i t e d to this p r o c e d u r e . I d e a l l y this p r o c e d u r e w o u l d be a d m i n i s t e r e d by m a c h i n e to o b t a i n m a x i m u m u n i f o r m i t y in t r a i n i n g but this has not yet b e e n p o s s i b l e .
The t r a i n i n g p r o c e d u r e has two parts. The first part is an informal s e s s i o n in w h i c h the user is told how to speak indi- vidual words to the s y s t e m and e x a m p l e s of the c o m p l e t e v o c a b u l a r y are c o l l e c t e d by
the r e c o g n i t i o n system. ~he s e c o n d part is a d m i n i s t e r e d very m e c h a n i c a l l y b y r e a d - ing a t u t o r i a l d o c u m e n t to the user and r e q u e s t i n g the u t t e r a n c e of trial sen- tences. T h i s p o r t i o n of the t r a i n i n g i n t r o d u c e s the user to the i n t e r a c t i v e s y s t e m ' s c a p a b i l i t i e s a n d is s p e c i f i c a l l y d e s i g n e d to be a d m i n i s t e r e d b y the m a c h i n e .
Some P e r f o r m a n c e Data
An e x p e r i m e n t was run d u r i n g the sum- mer of 1982 t o o b t a i n D P - 2 0 0 p e r f o r m a n c e d a t a in an e n v i r o n m e n t of type (4) as d e s c r i b e d above. Beca~ise no v o i c e i n t e r a c t i v e s y s t e m was yet a v a i l a b l e , a s y s t e m s i m u l a t i o n was used. A f t e r the f i r s t part of the t r a i n i n g s e s s i o n in w h i c h the v o i c e s a m p l e s w e r e c o l l e c t e d , the s u b j e c t was p l a c e d in a r o o m b e h i n d a d i s p l a y t e r m i n a l w i t h a h e a d m o u n t e d m i c r o p h o n e . The v o i c e t u t o r i a l was r e a d t o the s u b j e c t t h r o u g h a l o u d s p e a k e r at the t e r m i n a l i n t r o d u c i n g the c a p a b i l i t i e s of the s i m u l a t e d s y s t e m and the t y p e s of v o i c e c o m m a n d s that c o u l d be e x e c u t e d . The s u b j e c t ' s c o m m a n d s w e r e r e c o g n i z e d by the D P - 2 0 0 a n d e x e c u t e d by the s i m u l a t i o n . Thus e a c h user c o m m a n d r e s u l t e d in e i t h e r a p p r o p r i a t e a c t i o n v i s i b l e on the s c r e e n or a v o i c e e r r o r m e s s a g e . In the final p o r t i o n of the e x p e r i m e n t , the s u b j e c t was a s k e d to s o l v e an i n v o i c e p r o 6 1 e m that i n v o l v e d c o m p u t i n g c o s t s for a s e r i e s of i n d i v i d u a l items a n d f i n d i n g the tax a n d total. The e x p e r i m e n t g a v e a r e a s o n a b l y a c c u r a t e s i m u l a t i o n of the e x p e c t e d N L C s y s t e m b e h a v i o r w h e n it b e c o m e s c o m p l e t e l y v o i c e i n t e r a c t i v e . The e x p e r i m e n t a t t e m p t e d to s i m u l a t e a s y n t a c t i c level of v o i c e e r r o r c o r r e c t i o n but n o t h i n g d e e p e r . It was fo,lnd that the D P - ~ 0 0 w o r d e r r o r r a t e rose to a b o u t 20 p e r c e n t in this test w i t h a b o u t 14 of the 20 p e r c e n t b e i n g a u t o m a t i c a l l y c o r r e c t a b l e . The v o c a b u l a r y size was 80, w i t h t h r e e s a m p l e s of most words, a n d six s a m p l e s of a few of the d i f f i c u l t words, stn1=ed in the D P - 2 0 0 . This m e a n s that r o u g h l y e v e r y two to four s e n t e n c e s will h a v e a s i n g l e w o r d e r r o r not c o r r e c t a b l e at s h a l l o w levels. T h i s d a t a c o m e s from the first t w o h o u r s o~ u s a g e for t h e s e s u b j e c t s and we e x p e c t s i g n i f i c a n t i m p r o v e m e n t as usage e x p e r i - ence i n c r e a s e s over time.
M o r e r e c e n t l y , the ~ L C s y s t e m has b e c o m e o p e r a t i v e in a v o i c e d r i v e n m o d e and s u b j e c t t e s t i n g has b e g u n u s i n g the s a m e t r a i n i n g p r o c e d u r e . It is too e a r l y to r e p o r t r e s u l t s but it a p p e a r s that the p e r f o r m a n c e p r e d i c t e d in the s i m u l a t i o u will be a p p r o x i m a t e l y a c h i e v e d . This e x p e r i m e n t will i n c l u d e longer u s a g e by the s u b j e c t s and thus i n d i c a t e h o w m u c h e r r o r rates d e c r e a s e o v e r time.
In c o n c l u s i o n , w e h a v e at this time o n l y f r a g m e n t a r y i n f o r m a t i o n r e g a r d i n g w h a t levels
of
p e r f o r m a n c e can be a c h i e v e d . HOwever, w e h a v e d e v e l o p e d some t o o l s for m a k i n g m e a s u r e m e n t s and will r e p o r t the r e s u l t s as t h e y b e c o m e a v a i l - able.s y s t e m s has b e e n r e f i n e d to the point that it c o u l d a c t u a l l y s u p p o r t user i n t e r a c - tions in real time as we a r e a t t e m p t i n g to do. Our p r o j e c t uses w e l l d e v e l o p e d s p e a k e r d e p e n d e n t
voice
r e c o g n i t i o n e q u i p - ment w i t h a small e n o u g h v o c a b u l a r y to a c h i e v e u s a b l e a c c u r a c y rates.O T R E R kK)RK
M u c h of the a p p l i e d w o r k in n a t u r a l
language
p r o c e s s i n g h a s c o n c e r n e d d a t a b a s e q u e r y ( B r o n n e n b e r g et al. C8], CoddC9], H a r r i s [ 1 7 , 1 8 ] , H e n d r i x [ 2 2 ] ,MylO-
p o U l O e [ 2 7 ] , Plath[29], T h o m p s o n and T h o m p - son[32], W e l t z [ 3 5 ] , a n d W o o d s et el. [36]). At least o n e such s y s t e m is b e i n g m a r k e t e d (namely I N T E L L E C T [18]), w h i l e s e v e r a l others h a v e b e e n s u c c e s s f u l l y used in pilot studies. (Damerau[10], F.~ly and W e s c o u r t [ 1 2 ] , H e r s h m a n et el. [24], Krause[25], T e n n a n t [ 3 1 ] ) .A s d e s c r i b e d i n t h i s p a p e r , o u r i n i - t i a l w o r k w i t h N-LC i n v o l v e d p r o g r a m m i n g as an a p p l i c a t i o n area, w h i l e our m o r e recent interest has s h i f t e d t o w a r d o f f i c e d o m a i n s . However, as P e t r i c k [ 2 R ] observes, m a n y of the same t e c h n i c a l p r o b - lems a r i s e r e g a r d l e s s of a p p l i c a t i o n area. For the m o s t part, the i m p e r a t i v e s e n t e n c e s t r u c t u r e s we a r e d e a l i n g w i t h a r e s i m p l e r than the q u e s t i o n forms r e c o g n i z e d by the d a t a b a s e s y s t e m s cited above, w h i l e our n o u n p h r a s e s tend to e x h i b i t m o r e ela- b o r a t e structures. Furthermore, w h e r e a s typical d a t a b a s e s y - t e m s p r o c e s s each input separately, or perhaps seek to h a n - dle e l l i p s i s by c o n s u l t i n g the i m m e d l a t e l y p r e c e d i n g input, we b u i l d up a richer s e m a n t i c c o n t e x t as a s e s s i o n p r o c e e d s to be used in h a n d l i n g m a t t e r s such as focus and p r o n o u n r e s o l u t i o n .
The most d i s t i n c t i v e features of our p r e s e n t w o r k are (a) the i n c l u s i o n of voice input and output facilities, a n d (b) an attempt to deal with r e l a t i v e l y "deep" r e l a t i o n s h i p s a m o n g d o m a i n objects. A more d e t a i l e d d i s c u s s i o n of the d o m a i n - independent m e c h a n i s m s a p p e a r s in Bier- mann[5], and as d e s c r i b e d in B a l l a r d [2] the related LDC p r o j e c t b e i n g c o n d u c t e d in our l a b o r a t o r y is built around m a n y of these techniques. Similar r e s e a r c h pro- jects which are m o v i n g a w a y from a fixed d a t a b a s e s e t t i n g include work by Haas and Hendrix[16], Reldorn[20], H e n d r i x a n d Lewis[231, and T h o m p s o n and T h o m p s o n [33].
D u r i n g the 197O's a number of speech u n d e r s t a n d i n g systems were d e v e l o p e d under A R P A support (Lea [26], Reddy C30], W a l k e r [34], W o o d s [37]) and c u r r e n t l y some sys- tems ace b e i n g built in other countries, for e x a m p l e [19]. Rowever, n o n e of these
t89
r.Zl
[2]
[3]
[4]
[s]
[
6]
[71
r87
[93
[i0]
R E F E R E N C E S
B.W. Ballard, " S e m a n t i c and Pro- c e d u r a l P r o c e s s i n g for a N a t u r a l L a n g u a g e P r o g r a m m i n g System," Ph.D. D i s s e r t a t i o n , Report C S - 1 9 7 9 - 5 , Dept. Of C o m p u t e r Science, Duke U n i v e r s i t y , Durham, NC, 1979.
B.W. Ballard, "A D o m a i n - C l a s s A p p r o a c h to T r a n s p o r t a b l e N a t u r a l L a n g u a g e P r o c e s s i n g , " C O g n i t i o n and B r a i n Theory, 5, pp. 269-~87, 1982.
B.W. B a l l a r d and A.W. Biermann, "Pro- g r a m m i n g in N a t u r a l L a n g u a g e : N L C as P r o t o t y p e , " P r o c e e d i n g s o f the 197g A C M National" C o n f e r e n c e , " ~ t o ~ , ,
I-~J79.
A.W. Biermann. "A N a t u r a l L a n g u a g e P r o c e s s o r for O f f i c e A u t o m a t i o n , " P r o c e e d i n g s of the 1982 O f f i c e A u t o - m a r i o n
Conir~re-'~'6e, San Franc{sco,
~rai'l-~rnla, April, 1982.A.W. 8iermann, "Natural L a n g u a g e Pro- g r a m m i n g , " to a p p e a r in C o m p u t e r Pro- ~ m S y n t h e s i s M e t h o d o l o g i e s ( E ~ .
a n n and Guiho), Reide~, 1983.
A.W. B i e r m a n n and S.W. Ballard, "Towards N a t u r a l L a n g u a g e Computa- tion," A m e r i c a n J o u r n a l of C o m p u t a - tional L i n @ u i s t l ~ , vol. 6, No. 2,
pp.--67-~T-86, 1980.
A.W. Biermann, ~.W. Ballard, and A.H. Sigmon, "An E x p e r i m e n t a l S t u d y of N a t u r a l L a n g u a g e P r o g r a m m i n g , " to appear in I n t e r n a t i o n a l J o u r n a l of M a n - M a c h i n e Studies, 1983.
W. Bronnenberg, S. Landsbergen, R. Scha, and W. S c h o e n m a k e r , "PHLIQA-I, A Q u e s t i o n - A n s w e r i n g System for D a t a - B a s e C o n s u l t a t i o n in N a t u r a l E n g l i s h , " P h i l i p s Tech. Rev., 38, DD. 2 2 9 - 2 3 9 a n ~ - - ~ 8 ~ ' - ~ 9 7 , ~ 1 9 7 9 . ""
E.F. Codd, " S e v e n Steps to
RENDEVOUS
w i t h the C a s u a ' User," IBM Report J1333, 1974.F.J. Damerau, " O p e r a t i n g S t a t i s t i c s for the T r a n s f o r m a t i o n a l Q u e s t i o n A n s w e r i n g System," A m e r i c a n J o u r n a l of Computational Linpuist~cs,
[II] H. Deas, M.Sc. Thesis, Dept. of C o m - puter Science, Duke U n i v e r s i t y , Dur- ham, N.C., N o v e m b e r 1982.
[12] D. Egly and K. W e s c o u r t , " C o g n i t i v e Style, C a t e g o r i z a t i o n s , a n d V o c a - tional E f f e c t s on P e r f o r m a n c e of R E L D a t a b a s e Users," Joint C o n f e r e n c e on Easier and M o r e P r ~ c t i v e U s e o-~ C o m p u t i ~ Systems, Ann Arbor,--~ch~-- Nan, M a y 1981.
[ 1 3 ] L. Fineman, " P r e l i m i n a r y R e s u l t s on the V o i c e D r i v e n I n f o r m a t i o n S y s t e m S i m u l a t i o n E x p e r i m e n t , " Report to IBM C o r p o r a t i o n , Dept. of
Computer
Sci- ence, Duke U n i v e r s i t y , Durham, N.C.,1981.
[14] P.K. Pink, " C o n d i t i o n a l s in a N a t u r a l L a n g u a g e S y s t e m " (Master's T h e s i s ),
Report
C S - 1 9 8 1 - 8 , Duke U n i v e r s i t y ,Durham, N.C., 1981.
[153 R. Geist, D. Kraines, a n d P. Fink, " N a t u r a l L a n g u a g e C o m p u t i n g in a L i n e a r A l g e b r a C o u r s e , " P r o c e e d i n g s of the N a t i o n a l E d u c a t i o n a l C o m p u t i n g C'on e ~ e n c e , June, i982.
[16] N. Haas and G. Hendrix, "An A p p r o a c h to A c q u i r i n g and A p p l y i n g K n o w l e d g e , " First N a t i o n a l C o n f e r e n c e on A r t i f i - c
1 - - ~ I n t e l !igence, 1980.
[17] L.R. Harris, "User O r i e n t e d Data Base Q u e r y w i t h the R O B O T Natural L a n g u a g e Q u e r y S y s t e m , " I n t e r n a t i o n a l J o u r n a l of M a n - M a c h i n e Studies, pp. 6 ~ - ~ , Sept e m ~ e r---'~.
[183 L. Harris, "The R O B O T System: N a t u r a l L a n g u a g e P r o c e s s i n g A p p l i e d to D a t a b a s e Q u e r y , " P r o c e e d i n g s of the 1978 ACM N a t i o n a l C o n f e r e n c e , p ~ .
[19] J.P. Haton and J.M. Pierrel, "Data • S t r u c t u r e s and O r g a n i z a t i o n of the M Y R T ILLE II System, " Fourth T . I . C . P . R . , Kyoto, Japan, 1 9 7 8 . - -
[20] G. Heidorn, "Natural L a n g u a g e Dialo- gue for M a n a g i n g an O n - L i n e C a l e n - dar, " IBM ~esear ch
Report
RC7447, 1978.[21] G.G. Hendri x, E.D. Sacerdot i, D. Sagalowicz, and J. Slocum, " D e v e l o p - ing a N a t u r a l L a n g u a g e I n t e r f a c e to C o m p l e x Data, " ACM T r a n s a c t i o n s on D a t a b a s e Systems,-~-~ol. 3, No. 2, pp-'?
r0~:rr~, rvrs:--.
[22] G.G. Henarix, "Human E n g i n e e r i n g for A oplied Natural L a n g u a g e P r o c e s s i n g , " Fifth I n t e r n a t i o n a l C o n f e r e n c e on Ar---{'~icial I n t e l l i ~ e n c e , pp. 183-191-~,
~9~7.
[23] G. H e n d r i x and W. Lewis, " T r a n s p o r t - a b l e N a t u r a l L a n g u a g e I n t e r f a c e s to D a t a b a s e s , " A n n u a l M e e t i n g of the Assoc. for C o m p u ~ i o ~ ~ u T - 6 t i c - ~ ,
[24] R. H e r s h m a n , R. Kelly, and H. Miller, "User P e r f o r m a n c e w i t h a N a t u r a l L a n g u a g e Q u e r y System for C o m m a n d C o n t r o l , " N P R D C T R 79-7, N a v y P e r s o n - nel R e s e a r c h and D e v e l o p m e n t Center, San Diego, C a l i f o r n i a , J a n u a r y 1979.
[25] J. Krause, " R e s u l t s of a User S t u d y w i t h the "User S p e c i a l t y L a n g u a g e , " S y s t e m and C o n s e q u e n c e s for the A r c h i t e c t u r e of N a t u r a l L a n g u a g e I n t e r f a c e s , " T e c h n i c a l R e p o r t 79.04.003, IBM H e i d e l b e r g S c i e n t i f i c Center, M a y 1979.
[26] W.A. Lea (Ed.), T r e n d s in S p e e c h R e c o g n i t i o n , Prentice---~l,'-[982.
[27] J. M y l o p o u l o s , A. Borgida, P. C~hen, N. R o u s s o p o u l o s , J. Tsotsos, and H. Wong, "TORUS - A Natural L a n g u a g e U n d e r s t a n d i n g S y s t e m for D a t a M a n a g e - ment," P r o c e e d i n g s of the F o u r t h I n t e r n a t i o n a l C o n f e r e n c e on A r t i f i - cial Intelli~enc"e, 1975.
[28] S.R. Petrick, "On N a t u r a l L a n g u a g e Based C o m p u t e r S y s t e m s , " IBM J o u r n a l of R e s e a r c h and D e v e l o p m e n t , Vol. ~-~, ~ . 4, pp. 3 ~ - 3 3 5 , 1976.
[29] W.J. Plath, "REQUEST: A N a t u r a l L a n g u a g e Q u e s t i o n - A n s w e r i n g System," ISM J o u r n a l of R e s e a r c h and D e v e l o p - ment, Vol. 20, No. 4, pp. 326-335,
19-97~.
[30] D.R. Reddy, " S p e e c h R e c o g n i t i o n by Machine: A Review," P r o c e e d i n g s of the IEEE, Vol. 64, No. 4, pp. 50~ "/-
[31] H. Tennant, " ~ x p e r i e n c e w i t h the E v a l u a t i o n of N a t u r a l L a n g u a g e Q u e s - t i o n A n s w e r e r s , " W o r k i n g Paper 18, A d v a n c e d A u t o m a t i o n Group, C o o r d i - n a t e d S c i e n c e Lab., U-iv. of Illi- nois, J a n u a r y 1979.
[32] F.B. T h o m p s o n and B.H. ~ o m p s o n , " P r a c t i c a l Natural L a n g u a g e P r o c e s s - ing: The R E L S y s t e m as P r o t o t y p e , " in A d v a n c e s in C o m p u t e r s , Vol. 13 (Eds. M. Rubino--~f and M.C. Y o v i t s ) , A c a d e m i c Press, New York, 1975.
[33] F. T h o m p s o n and B. T h o m p s o n , "Shaft- ing to a H i g h e r G e a r in a N a t u r a l L a n g u a g e System," AFIPS Proc. of the N a t i o n a l C o m p u t e r Conf., Vol. 50, pp. 6~7-662, 1981.
[34]
D.E. Walker (ed.), Understandin~ Spo- ken Language, Elsevier North-Holland, Ne"'wYork, 1978.[35] D.L. Waltz° "An English Language Ouestion Answering System for a Large Relational Database," Communications of the ACM, Vol. 21, No. 7, pp.
526-
[36]
W.A. Woods, R.M. Kaplan, and B. Nash-Webber0 "The Lunar Sciences Natural Language Information System: Final RepOrt," RepOrt 2378, Bolt, Berenek, and Newman° Cambridge, HA., 1972.[ 3 7 ] W.A. Woods° " M o t l v a t l o n and O v e r v i e w