Generating Contextually Appropriate Intonation

Generating Contextually Appropriate Intonation*. S c o t t P r e v o s t & M a r k S t e e d m a n C o m p u t e r a n d I n f o r m a t i o n S c i e n c e . U n i v e r s i t y o f P e n n s y l v a n i a . 200 S o u t h 3 3 r d S t r e e t P h i l a d e l p h i a P A 19104-6389, U S A . (Internet: prevost©linc, cis. upenn, edu steedman@cis, upenn, edu). A b s t r a c t . One source of unnaturalness in the output of text-to-speech systems stems from the in- volvement of algorithmically generated de- fault intonation contours, applied under minimal control from syntax and semantics. It is a tribute both to the resilience of hu- man language understanding and to the in- genuity of the inventors of these algorithms that the results are as intelligible as they are. However, the result is very frequently unnatural, and may on occasion mislead the hearer. This paper extends earlier work on the relation between syntax and intonation in language understanding in Combinatory Categorial Grammar (CCG). A generator with a simple and domain-independent dis- course model can be used to direct synthe- sis of intonation contours for responses to data-base queries, to convey distinctions of contrast and emphasis determined by the discourse model.. 1 T h e P r o b l e m . Consider the exchange shown in example (1). Capi- tals indicate stress, and brackets informally indicate the intonational phrasing. The intonation contour is indicated underneath using Pierrehumbert's nota- tion ([8], [1], see [13] for a brief summary). L + t t * . *Keywords: Speech-synthesis; Generation. We thank Mark Beutnagel and AT&T Bell Laboratories for allow- ing us access to the TTS speech synthesiser. The re- search was supported in part by NSF grant nos IRI90- 18513, IRI90-16592, and IRI91-17110, DARPA grant no. N00014-90-J-1863, and ARC) grant no. DAAL03-89- C0031.. and H* are different high pitch accents, and LH% and LL% (and its relative L) are rising and low boundaries respectively. The other annotations in- dicate that the intonational tunes L+H* LH% and H* LL% convey two distinct kinds of discourse in- formation. First, both pitch accents mark any word that they occur on (or rather, its interpretation) for "focus", which in the context of such simple queries as example (1) usually implies contrast of some kind. Second, the tunes as a whole mark the constituent t h a t bears them (or rather, its interpretation) as having a particular function in the discourse. We have argued at length elsewhere that, at least in this same restricted class of dialogues, the function of the L+H* LH% tune is to mark the "theme" - t h a t is, "what the participants have agreed to talk about". The H* LL% tune (and its relative the H* L tune) mark the "theme" - that is, "what the speaker has to say" about the theme. This phenomenon is a strong one: the same intonation contour sounds quite anomalous in the context of a question that does not establish the correct open proposition as the theme, such as Which device has the fast processor?. One further point is worth noting: the unit that we are calling the theme is not in this example a traditional syntactic constituent. Many problems in the analy- sis and synthesis of spoken language result from the partial independence of syntactic and intonational phrase boundaries.. The architecture of our system (shown in Figure 1) is for the most part self-explanatory, but we note that we follow a long tradition in separating the process of generation itself into two phases. The "strategic" phase is one in which the content of the utterance is planned, including the division into theme and rheme, and the assignment of contrastive focus. The "tactical" phase is one in which content is mapped. 332. (1) Q: I know t h a t the OLD widget had a SLOW processor. B u t w h a t processor does the NEW widget include?. A: (The Nv.W widget includes) (a FAST L + H * LH% H*. Ground Focus Ground Ground Focus Theme Rheme. processor) LL%. Ground. Prosodically Annotated Question. Intonational Parser. Strategic Generator. Tactical Generator. Prosodically Annotated Response. TTS Translator I ¢. I Speech Synthesizer [. Spoken Response. I Oatabas01. Figure 1: Architecture. onto strings of words.. 2 CCG-Based Prosody. We will assume a s t a n d a r d CCG of the kind dis- cussed in [11], [12], and [13]. For example, we shall write the category of a transitive verb like prefers either abbreviated, as in (2)a, or in full as in (2)b:. (2) a. ( S \ N P ) / N P b. ( S : include' z y \ N P : y ) / N F : z. In b, syntactic types are paired with a semantic in- t e r p r e t a t i o n via the colon operator, and the category is t h a t of a function from NPs (with interpretation x) to functions from NPs (with interpretation y) to Ss (with interpretation include' z y). C o n s t a n t s in interpretations bear primes, variables do not, and there is a convention of left associativity.. We also need the following two rules o f functional application, where X a n d Y are variables over cate- gories in either n o t a t i o n : . (3) FUNCTIONAL APPLICATION: a. x / Y Y x (>) b. Y X\Y => X (<). C C G extends this strictly context-free categorial base in two respects. First, all arguments, such as NPs, bear only type-raised categories, such as S / ( S \ N P ) . Similarly, all functions into such cate- gories, such as determiners, are functions into the raised categories, such as ( S / ( S \ N P ) ) / N . For ex- ample, subject NPs bear the following category in the full notation:. (4) widgets := S : s / ( S : s \ N P : widgets'). The derivation of a simple transitive sentence ap- pears as follows in the abbreviated notation: 1. (5) Widgets i n c l u d e s p r o c k e t s . S/(S\NP) (S\NP)/NP (S\NP)\((S\NP)/Ni a). S\~P. S. Second, the c o m b i n a t o r y rules are extended to in- clude functional composition, as well as application. T h e following rule will be relevant below:. (6) FORWARD COMPOSITION (>B): X / Y Y / Z ~ B X / g . This rule allows a second s y n t a c t i c derivation for the above sentence, as follows: 2. (7) Widget a i n c l u d e s p r o c k e t s . S/(S\NP) (S\NP)/NP S\(S/NP). S/tn •. S. 1The reader is encouraged to satisfy themselves using the full semantic notation that this deriva- tion yields an S with the correct interpretation include' sprockets' widgets'. At first glance, it looks as though type-raising will expand the lexicon Marmingly. One way round this problem is discussed in [14].. 2The reader is again strongly uged to satisfy them- selves that the S yielded in the derivation bears the cor- rect interpretation.. 333. T h e reasons for m a k i n g this move, which concern the g r a m m a r o f c o o r d i n a t e constructions, the gen- eral class of rules f r o m which the compositi o n rule is drawn, and the p r o b l e m o f processing in t h e face o f such associative rules, are discussed in tile earlier papers, and need not concern us here. T h e p o i n t for present purposes is t h a t the p a r t i t i o n of t h e sen- tence into the o b j e c t and a n o n - s t a n d a r d c o n s t i t u e n t S : i n c l u d e ' z ' w i d g e l s ' / N P : z makes this t h e o r y s t r u c t u r a l l y and s e m a n t i c a l l y perfectly suited t o t h e d e m a n d s o f i n t o n a t i o n , as exhibited in e x a m p l e (1). 3. We can therefore directly i n c o r p o r a t e i n t o n a t i o n a l c o n s t i t u e n c y in s y n t a x , as follows (cf. [12], [13], an d [15]). We assign to all c o n s t i t u e n t s an a u t o n o m o u s prosodic category, expressing their p o t e n t i a l for com- b i n a t i o n with o t h e r prosodic categories. T h e n we lock these two s t r u c t u r a l s y s t e m s t o g e t h e r via t h e following principle, which says t h a t s y n t a c t i c an d prosodic c o n s t i t u e n c y m u s t be isomorphic:. (S) PROSODIC CONSTITUENT CONDITION: C o m b i n a t i o n o f two s y n t a c t i c categories via a s y n t a c t i c c o m b i n a t o r y rule is only al- lowed if their prosodic categories can also c o m b i n e via a prosodic c o m b i n a t o r y rule.. One way to do this is to m a k e the b o u n d a r i e s ar- g u m e n t s a n d t h e pitch accents functions over t h e m . T h e b o u n d a r i e s are as follows: 4. (9) L : - b : 1 LL% : - b : il LH% := b : l h . As in C C G , categories consist of a s t r u c t u r a l t y p e, here b for b o u n d a r y , and an i n t e r p r e t a t i o n , associ- ated via a colon. T h e pitch accents have the follow- ing f u n c t i o n a l types: 5. (10) L + H * := p : l h e m e / b : lh H* := p : r h e m e / b : l , P : r h e m e / b : l l . We f u r t h e r assume, following Bird [2], t h a t the pres- ence o f a pitch accent causes some element(s) in t h e t r a n s l a t i o n of the category t o be m a r k e d as focussed, a m a t t e r which we will for simplicity assume occurs at the level o f the lexicon. For example, when in- cludes bears a pitch accent, its category will be as follows:. (11) ( S : ( * i n c l u d e ' ) x Y k N P : y ) / N P : x . T h e categories t h a t result f r o m the c o m b i n a t i o n o f a pitch accent and a b o u n d a r y m a y or m a y n o t con- s t i t u t e entire prosodic phrases, since there m a y be a prenuclear null tone. T h e r e m a y also be a null t o n e s e p a r a t i n g the pitch accent(s) f r o m the b o u n d a r y . . aA similar argument in a related categorial framework is made by Moortgat [6].. 4These categories slightly depart from Pierrehumbert. 5 Here we are ignoring the possibility of multiple pitch. accents in the same prosodic phrase, but cf. [13].. ( B o t h possibilities are i l l u s t r a t e d in (1)). We there- fore assign t h e following c a t e g o r y t o t h e null tone, which can t h e r e b y a p p l y t o t h e right t o an y non- functional c a t e g o r y o f t h e f o r m X : Y, an d compose to t h ei r right w i t h an y fu n ct i o n i n t o such a category, including a n o t h e r null tone, t o yield the sam e cate- gory:. (12) 0 : = X : Y / X : Y . It is this o m n i v o r o u s c a t e g o r y t h a t allows i ntona- tional t u n es t o be spread over a r b i t r a r i l y large con- stituents, since it allows t h e pitch accen t 's desire for a b o u n d a r y t o p r o p a g a t e v i a c o m p o s i t i o n i n t o the null t o n e cat eg o ry (see t h e earlier papers).. In order t o allow t h e d eri v at i o n t o proceed above t h e level o f c o m p l e t e p ro so d i c phrases identifying t h e m e s an d rhemes, we need two u n a r y category- changing rules to m a r k t h e i n t e r p r e t a t i o n a of t h e corresponding g r a m m a t i c a l c a t e g o r y w i t h t h a t dis- course f u n c t i o n an d change t h e phonological cate- gory, thus: 6. (13) ~ ::~ p : X p / p . (14) ~, => P : X p. T h e s e rules change t h e p ro so d i c c a t e g o r y e i t h e r to p, or t o an en d o cen t ri c f u n c t i o n over p. ( T h e s e types c a p t u r e t h e fact t h a t t h e LL% b o u n d a r y can only occur at t h e end o f a sentence, t h e r e b y correcting an o v e r g e n e r a t i o n in t h e version o f this t h e o r y in S t e e d m a n [13], n o t e d b y Bird [2]). T h e fact t h a t p is an a t o m r a t h e r t h a n a t e r m o f t h e f o r m X : Y is i m p o r t a n t , since it m e a n s t h a t it can c o m b i n e o n l y w i t h a n o t h e r p. T h i s is v i t al t o t h e p r e s e r v a t i o n of t h e i n t o n a t i o n s t r u c t u r e / . T h e a p p l i c a t i o n o f t h e ab o v e two rules t o a com- plete i n t o n a t i o n a l p h rase sh o u l d b e t h o u g h t o f as pre- ci p i t at i n g a side-effect w h e r e b y a copy o f t h e cat egory E is associated with t h e clause as its t h e m e or rheme. (We gloss over details o f how t h e m e s an d rhemes are associated with a p a r t i c u l a r clause, as well as a n u m - ber o f f u r t h e r c o m p l i c a t i o n s arising in sentences with m o r e t h a n one t h e m e ) . . In [13] an d [15], a rel at ed set o f rules o f which t h e present ones f o r m a subset are shown t o b e well- b eh av ed w i t h a wide ran g e o f examples. E x a m p l e (15) gives t h e d eri v at i o n for an e x a m p l e rel ated to (7) (since t h e raised o b j e c t c a t e g o r y is n o t crucial, it has been replaced b y N P t o ease co m p reh en sion): s Note t h a t it is t h e identification o f t h e t h e m e a n d . 8These rules represent both a departure from the ear- lier papers and a slight simplification of what is actually needed to allow prosodic phrases to combine correctly.. 7The category has the same effect of preventing fur- ther composition into the null tone achieved in the earlier papers by a restriction on forward prosodic composition.. SNote the focus-marking effect of the pitch accents.. 334. (15) Widgets include sprockets ( L+H* LHT, ) ( H* LLT~ ). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . S: s/(S: s\NP: *eidget ~) (S: include ~ x y\NP: y)/NP: x NP: *sprockets ' p: theme/b: lh b: lh P: rheme. S: include ' • ,eldget ~/NP : • p: theme. S : i n c l u d e ' • * w i d g e t ' / N P : • NP: * s p r o c k e t s ' p i p p. S : i n c l u d e ' * s p r o c k e t s ' * w i d g e t ' . P Theme: S : i n c l u d e z * w i d g e t / N P : z Rheme: N P : , s p r o c k e t s . rheme at the stage before the final reduction that de- termines the information structure for the response, for it is at this point that discourse elements like the open proposition are explicit, and can be used in semantically-driven synthesis of intonation contour directly from the constituents.. Of course, such gushingly unambiguous intonation contours are comparitively rare in normal dialogues. Even in the context given in (7), a more usual re- sponse to the question would put low pitch - that is, the null tone in Pierrehumbert's terms - on every- thing except the focus of the rheme, sprockets, as in the following:. (16) Widgets include SPROCKETS. Such an utterance is of course ambiguous as to whether the theme is widgets or w h a t widgets in- clude. The earlier papers show that such "un- marked" themes, which include no pitch accent be- cause they are entirely background, can be captured by a "Null Theme Promotion Rule", as follows: 9. (17) ~ :E X : Y / X : Y ::~ p : t h e m e . 3 P a r s i n g . Having established the relationship between prosody, information structure and CCG syntax, we can now address the computational problem of automatically directing the synthesis of intonation contours for responses to database queries. Our computational model (shown in Figure 1) starts with a prosodically annotated wh-question given as a string of words with associated Pierrehumbert-style pitch accent and boundary markings. We employ a simple bottom-up shift-reduce parser of the kind presented in [14], mak- ing direct use of the CCG-Prosody theory described above, to identify the semantics of the question. The. 9See the next section concerning the nondeterminism inherent in this rule.. inclusion of prosodic categories in the grammar al- lows the parser to identify the information structure (theme and theme) within the question as well. The focus and background information within the theme and theme (if any) is further marked by the focus predicate * in the semantic representation. For ex- ample, given the question (18) below, the parser pro- duces the semantic and information structure repre- sentations shown in (19). 1°. (18) I know that widgets contain cogs, but what parts do WODGETS include?. L+H* LH% H* LL%. (19) prop: theme:. rheme:. s : Ax[part( x ) & i n c l u d e ( *wodgets, x)] s : A x ~ a r t ( x )&:include( , w o d g e t s , x ) ] / . (s: i . e t ( . w o d g ~ t s , ~ ) l . p : ~) s : i n c l u d e ( , w o d g e t s , x ) / n p :. The nondeterminism inherent in unmarked themes is handled by default: the present implementation of Null Theme Promotion delivers the longest un- marked theme that the syntax permits. 11. 4 Strategic Generation The strategic phase of generating a response is somewhat simplified in the current implementation, and we have cut a number of corners. In par- ticular, we currently assume t h a t the question is the sole determinant of the information structure in the answer. This is undoubtedly an oversim- plification. The complete specification of the se- mantic and information structures provided by the parser is used by the generator to determine the intelligible and prosodically natural response. For. 1°The alert reader will note that the notation for con- stants, variables, and functional application is slightly changed in these sections, to correspond to the Prolog implementation.. 11This is a simplification, but a harmless one for the simplified query domains that we are dealing with here.. 335. a wh-question, the s e m a n t i c representation corre- sponds to a l a m b d a expression in one or m o r e vari- ables ranging over individuals in the database, and has the s t r u c t u r e of a Prolog query which we can evaluate to d e t e r m i n e the possible i n s t a n t i a t i o n s o f the open proposition. T h e i n s t a n t i a t e d proposi- tion d e t e r m i n e s t h e s e m a n t i c proposition to be con- veyed in t h e response. For the e x a m p l e above, this is p a r t ( s p r o c k e t s ) & i n c l u d e ( . w o d g e t s , . s p r o c k e t s ) - "Wodgets include sprockets".. Note t h a t the derived semantics includes the neces- sary occurrences of the focus predicate *, d e t e r m i n e d as follows. All t e r m s t h a t are focused in the ques- tion semantics are focused in the response semantics. Intuitively, the i n s t a n t i a t e d variable in the response semantics m u s t also be focused since it represents t h e i n f o r m a t i o n which is n e w in the response. For m o r e c o m p l e x rhemes such as quantified N P s with modi- fiers, we focus those elements o f the semantic repre- s e n t a t i o n t h a t are new in the current context. ( T h a t is, ones which did n o t figure in the i n t e r p r e t a t i o n o f the original query). T h u s , given a question such as (1), we choose to focus the modifier "fast" r a t h e r t h a n the n o u n "processor" in the rheme. Similary, in the exchange below, we focus "processor" instead o f "fast" because of its newness in the context .. (20) Q: W h a t fast c o m p o n e n t does the widget include?. A: T h e widget includes a fast P R O C E S - SOR.. T o d e t e r m i n e an a p p r o p r i a t e i n t o n a t i o n c o n t o u r for the u t t e r a n c e , we m u s t f u r t h e r d e t e r m i n e t h e a p p r o p r i a t e i n f o r m a t i o n s t r u c t u r e . F o r t u n a t e l y , for the simple question-answering task, the i n f o r m a t i o n s t r u c t u r e of the response can be assumed to be com- pletely d e t e r m i n e d by the original query. T h e t h e m e of a question corresponds to " w h a t the question is a b o u t " - in this case, " p a r t s " . T h e t h e m e of a ques- tion corresponds to " w h a t t h e speaker wants t o know a b o u t the t h e m e " - here, " W h a t wodgets include". It follows t h a t we e x p e c t the t h e m e o f the ques- tion to d e t e r m i n e the t h e m e of the response. For e x a m p l e (18), the t h e m e of the response should be S : i n c l u d e ( . w o d g e t s , x ) / N P : x , as in (21) below. Note t h a t we simplify the strategic g e n e r a t i o n prob- lem by including the s y n t a c t i c category in o u r repre- sention of the t h e m e (as d e t e r m i n e d by the syntac- tic category o f the t h e m e of the original q u e s t i o n ) ) 2 Given t h e s y n t a c t i c and semantic r e p r e s e n t a t i o n o f t h e t h e m e o f t h e response, the C C G c o m b i n a t i o n rules can easily be invoked to d e t e r m i n e the t h e m e o f the response. T h e r h e m e is simply the c o m p l e m e n t . 12Here we are cutting another corner: the theme, and hence the rheme, are fully specified syntactically, as well as semantically, as a result of the analysis of the question: in a more general system, we would presumably need to specify syntactic type from scratch, starting from pure semantics.. o f t h e t h e m e with respect t o t h e overall semantics of t h e response, as in (21) below, o b t a i n e d b y instan- t i a t i n g the result an d one i n p u t o f t h e a p p r o p r i a t e c o m b i n a t o r y rule (cf. [7]): la. (21) prop: s : i n c l u d e ( , w o d g e t s , *sprockets) theme: s : i n c l u d e ( . w o d g e t s , x ) / n p : x theme: np : , s p r o c k e t s . 5 T a c t i c a l G e n e r a t i o n a n d C C G . J u s t as t h e shift-reduce parser sketched above can readily b e m a d e to c o n s t r u c t t h e i n t e r p r e t a t i o n s and i n f o r m a t i o n s t r u c t u r e s shown in t h e ex am p l es, specif- ically m a r k i n g t h em es, r h e m e s a n d t h e i r foci, so it is relatively easy t o do t h e reverse---to g e n e r a t e pro- sidically a n n o t a t e d strings f r o m a fo cu s-m ark ed se- m a n t i c r e p r e s e n t a t i o n o f t h e m e s an d rhemes.. For simplicity, we s t a r t b y describing t h e syntac- tic an d s e m a n t i c aspects o f t h e g e n e r a t o r , ignoring p r o s o d y for t h e m o m e n t . In c o n s t r u c t i n g a t a c t i c a l g en erat i o n schema, several design o p t i o n s are avail- able, including b o t t o m - u p , t o p - d o w n a n d s e m a n t i c h ead -d ri v en m o d el s ([3], [10]). We a d o p t a h y b r i d ap p ro ach , e m p l o y i n g a basic t o p - d o w n s t r a t e g y t h a t takes a d v a n t a g e o f t h e C C G n o t i o n o f "fu n ctional head" t o avoid fruitless search. W h i l e this tech- nique exhibits some inefficiencies characteristic of a d e p t h - f i r s t search, it has several significant advan- tages. First, it does n o t rely o n a specific seman- tic r e p r e s e n t a t i o n , a n d requires o n l y t h a t t h e seman- tics be c o m p o s i t i o n a l a n d r e p r e s e n t a b l e in Prolog. T h u s t h e g e n e r a t i n g p r o c e d u r e is i n d e p e n d e n t o f the p a r t i c u l a r g r a m m a r . T h i s m o d u l a r c h a r a c t e r o f the s y s t e m has been v ery useful in developing t h e com- p et en ce g r a m m a r p r o p o s e d in t h e preceding section, an d offers a basis for p ro v i n g t h e completeness o f t h e i m p l e m e n t a t i o n w i t h respect t o t h e c o m p e t e n c e theory.. T h e t act i cal g e n e r a t i o n p r o g r a m is w r i t t e n in Pro- log, an d works as follows. S t a r t i n g w i t h a s y n t a c t i c c o n s t i t u e n t (initially s) an d a s e m a n t i c f o r m u l a , we utilize t h e C C G r e d u c t i o n rules t o d e t e r m i n e possible s u b c o n s t i t u e n t s t h a t can c o m b i n e t o yield t h e orig- inal co n st i t u en t , invoking t h e g e n e r a t o r recursively t o g en erat e t h e p r o p o s e d s u b c o n s t i t u e n t s . T h e base case o f t h e recursion occurs w h en a c a t e g o r y we wish t o generate unifies w i t h a c a t e g o r y in t h e lexicon. For ex am p l e, suppose we wish t o g en erat e an u t t e r - ance co rresp o n d i n g t o t h e c a t e g o r y s : w a l k s ' ( m a r y ' ) . Since t h e given c a t e g o r y does n o t u n i fy w i t h an y cat- egory in t h e lexicon, t h e p r o g r a m proposes possible s u b c o n s t i t u e n t s b y checking t h e C C G c o m b i n a t i o n rules in some p r e - d e t e r m i n e d order. By t h e back- ward f u n c t i o n a p p l i c a t i o n rule, we m i g h t h y p o t h e - size t h a t t h e categories x an d s : w a l k s ' ( m a r y ' ) \ z are t h e s u b c o n s t i t u e n t s o f s : w a l k s ' ( m a r y ' ) , where x is. 13Again the example is simplified by the use of a non- raised category for the object.. 336. (22) gen(s:def(x, ((engine(x)~new(x))&shiny(x))~ def(y, ((gear(y)&rotating(y))&largest(y))~contains(x,y)))).. RESULT: the shiny nee engine contains the largest rotating gear.. (23) genCs:exists(z, (engineerCz)~brilliantCz))kexists(x,(design(x)~revolutionary(x))& d e f ( y , (engine(y)~new(y))~gave(z,y,x))))).. RESULT: a b r i l l i a n t e n g i n e e r g a v e t h e n e e e n g i n e a r e v o l u t i o n a r y d e s i g n . . (24) gen(s:def(x,(widget(x)~*new(x))~probably(contains(x,y)))/np:y @ p:theme). RESULT: the new@lhstar vidget probably containsQlhb.. (25) gen(np:(x's)'def(x,(processor(x)&*fastest(x))ls) @ ph:rheme). RESULT: t h e f a s t e s t @ h s t a r p r o c e s s o r @ l l b . . (26) gen(s:def(x,(widget(x)&new(x))& *probably(contains(x,y)))Inp:y @ p:theme). RESULT: the new widget probably@lhstarcontains@lhb.. (27) gen(s:def(x,(*widget(x)~new(x))~contains(x,y))/np:y @ p:theme). RESULT: the new widgetQlhstar contains@lhb.. some variable. If we recursively call the generator on s:walks'(mary')kx, we find that it unifies with the category s:walks'(y}knp:y in the lexicon, corre- sponding to the lexical item walks. This unification forces the complementary category z to unify with np:mary', which yields the lexical item mary when the generator is recursively invoked. Concatenat- ing the results of generating the proposed subcon- stituents therefore gives the string "Mary walks.". The top-down nature of the generation scheme has a number of important consequences. First, the order in which we generate the postulated subconstituents determines whether the generation succeeds. Had we chosen to generate x before s:walks'(mary'}kx, we would have entered a poten- tially infinite recursion, since x unifies with every category in the lexicon. For this reason, our gener- ator always chooses to recursively generate the sub- constituent that acts as the functional head before the subconstituent that acts as the argument under the CCG combinatory rules. By strictly observing this principle, we ensure that as much semantic infor- mation as possible is deployed, thereby constraining the search space by prohibiting spurious unifications with incorrect items in the lexicon. For this reason, we refer to our generation scheme as a "functional head"-driven, top-down approach.. One disadvantage of the top-down generation tech- nique is its susceptibility to the non-termination problem. If a given path through the search space does not lead to unification with an item in the lexicon, some condition which aborts the path in question at some search depth must be imposed.. Note that whenever the CCG function application rules are used to propose possible subconstituents to be recursively generated, the subconstituent acting as the functional head has one more curried argu- ment than its parent. Since we know t h a t in En- glish there is a limit to the number of arguments that a functional category can take, we can abort fruitless search paths by imposing a limit on the number of curried arguments t h a t a CCG category can possess. The current implementation allows categories with up to three arguments, the mini- mum needed for constructions involving di-transitive verbs. Note that this strategy does not prohibit the generation of categories whose arguments them- selves are complex categories. Thus, we allow cat- egories such as ((s\np)/np)\(((s\np)/np)/np) for raised indirect objects, but not categories such as (((s\ np}/np)/np)/np.. When the CCG composition rule is used to pro- pose possible subconstituents, the subconstituents do not have more curried arguments than their par- ent. Consequently, imposing a bound of the type described above will not necessarily avoid endless re- cursion in all cases. Suppose, for example that we wish to generate a category of the form s/x, where s is a fully instantiated expression and x is a variable. If the function application rules fail to produce sub- constituents that generate the category, we rely on the CCG composition rule to propose the possible subconstituents s / y and y/x. Since s/x and s / y are identical categories to within renaming of variables, the recursion will continue indefinitely. We rectify this situation by invoking the composition rule only. 337. if the original category has an instantiation for both its argument and result. Such a solution imposes limitations on the types of derivations allowed by the system, but retains the simplicity and transparency of the algorithm. Merely imposing a limit on the depth of the recursion provides a more general solu- tion. Examples of the types of sentences that can be generated appear in (22) and (23).. This procedure can immediately be applied to the prosodically augmented grammar. To do so, we merely enforce the Prosodic Constituent Condition at each step in the generation. That is, whenever a pair of subconstituents are considered (by revers- ing the CCG combination rules), a pair of prosodic subconstituents are also considered and recursively generated using the prosodic combinatory rules. Ex- amples (24) and (25) illustrate the generation of in- tonation for the theme and theme of the utterance "The NEW widget probably contains the FASTEST processor" .14 Examples (26) and (27) manifest the intonational results of moving the thematic focus among the various propositions in the semantic rep- resentation of the theme "The new widget probably c o n t a i n s . . . ".. 6 S y n t h e s i s . We showed in the previous section how constituents of the type shown in (21) can generate intonation- ally annotated strings. The resulting string for the current example is "wodgets@lhstar include@lhb sprockets@[hstar, llb]." The final aspect of gener- ation involves translating such a string into a form usable by a suitable speech synthesiser. Currently, we use the Bell Laboratories I T S system ([5]) as a post-processor to synthesise the speech wave. Ex- ample (28) shows the translated output for the same example, as it is sent to this synthesiser.. (28) \!> \!*L+H*I wodgets \ ! f L 1 include \ ! p L 1 \ ! b H 1 \!*H*2 sprockets \ ! p L 1 \ ! b L 1 . \ ( * [ 2 0 ] \ ) . We stress that we use TTS as an unmodified output device, without any fine tuning other than in the lexicon. While TTS is particularly easy to use with Pierrehumbert's notation, we are confident that our system can easily be adapted to other synthesisers.. 7 R e s u l t s . The system just described produces sharp and natural-sounding distinctions of intonation contour in minimal pairs of queries like the following:. 14The ~ symbol separates syntactic categories from their corresponding prosodic categories and lexical items from their pitch/boundary markings.. (29) Q: I know that widgets contain cogs, but what gadgets include SPROCKETS? L+H* LH% H* LL% p r o p : s : A x [ g a d g e t ( x ) & i n e l ( x , *sprockets)] t h e m e : s : A x [ g a d g e t ( ~ ) & i n e l ( x , * s p r o c k e t s ) ] ] . (s : i n e l ( x , * s p r o c k e t s ) \ r i p : x ) r h e m e : s : i n e l ( x , * s p r o e k e t s ) \ n p : x . A: p r o p : s : i n e l ( * w o d g c t s , * s p r o c k e t s ) t h e m e : s : i n c l ( x , * s p r o e k e t s ) \ n p : x r h e m e : n p : *wodgets. WODGETS include SPROCKETS. H* L L+H* LH%. (30) Q: I know t h a t widgets contain cogs, but what parts do WODGETS include? L+H* LH% H* LL%. p r o p : s : A x [ p a r t ( x ) & i n e l ( * w o d g e t s , x ) ] t h e m e : s : A x ~ g a r t ( x ) & i n e l ( * w o d g e t s , x ) ] / . (s : i n e l ( * w o d g e t s , x ) / n p : x ) r h e m e : s : i n c l ( * w o d g c t s , x ) / n p : x. A: p r o p : s : i n e l ( * w o d g e t s , * s p r o c k e t s ) t h e m e : s : i n e l ( * w o d g e t s , x ) / n p : x r h e m e : n p : *sprockets. WODGETS include SPROCKETS. L+H* LH% H* LL%. (31) Q: I know t h a t programmers use widgets, but which people DESIGN widgets?. L+H* LH% H* LL% p r o p : s : A x ~ e o p l e ( x ) & * d e s i g n ( x , w i d g e t s ) ] t h e m e : s : A x ~ e o p l c ( x ) & * d e s i a n ( x , widgets)] I. (s : * d e s i g n ( x , w i d g e t s ) \ n p : x ) r h e m e : s : * d e s i g n ( x , w i d g e t s ) \ n p : x. A: p r o p : s : * d e s i g n ( * e n g i n e e r s , w i d g e t s ) t h e m e : s : * d e s i g n ( x , w i d g c t s ) \ n p : x r h e m e : n p : * e n g i n e e r s . ENGINEERS DESIGN widgets. H* L L+H* LH%. (32) Q: If engineers design widgets, which people design WODGETS?. L+H* LH% H* LL% p r o p : s : A x ~ c o p l e ( x ) & d e s i g n ( x , * w o d g e t s ) ] t h e m e : s : A x [ p e o p l e ( x ) & d e s i g n ( x , *wod#ets)]/. (s : d e s i g n ( x , , w o d g e t s ) \ n p : ~) r h e m e : s : d e s i g n ( x , * w o d g e t s ) \ n p : x. A: p r o p : s : d e s i g n ( * p r o g r a m m e r s , *wodgets) t h e m e : s : d e s i g n ( x , * w o d g e t s ) \ n p : x r h e m e : np : * p r o g r a m m e r s . PROGRAMMERS design WODGETS. H* L L+H* LH%. Examples (29) and (30) illustrate the ability of our system to produce appropriately different intonation contours for identical strings of words depending on the context, which determines the information struc- ture of the r e s p o n s e . . I f the responses in these ex- amples are interchanged, the result sounds distinctly. 338. unnatural in the given contexts. From examples (31) and (32), it will be apparent t h a t our system has the ability to make distinctions in focus placement within themes and rhemes based on context. T h e issue of focus placement can be crucial in more com- plex themes and rhemes, as shown below: (33) Q: I know the old widget has the slowest processor,. but which widget has the FASTEST processor? L+H* LH% H* LL%. A : The NEW widget has the FASTEST processor. H* L L+H* LH%. (34) Q: The old widget has the slowest processor, but which processor does the NEW widget have?. L+H* LH% H* LL% A " The NEW widget has the FASTEST processor.. L+H* L H ~ H* LL%. (35) Q: The new WODGET has the slowest processor, but which processor does the new WIDGET have?. L+H* LH~ H* LL% A: The new WIDGET has the FASTEST processor.. L+H* LH~0 H* LL%. As noted earlier, such precisely specified themes are u n c o m m o n in normal dialogue. Consequently, the Null Tone P r o m o t i o n rule is employed for un- marked themes, allowing the types of responses in (36) and (37) below. T h e t h e m e is taken to be the longest possible prosodically unmarked constituent allowed by the syntax.. ( 3 6 ) Q : I k n o w t h a t p r o g r a m m e r s u s e w i d g e t s , but which people DESIGN widgets?. H* LL% A: ENGINEERS design widgets.. H* L. (37) Q: If engineers design widgets, which people design WODGETS?. H* LL% A: PROGRAMMERS design wodgets.. H* L. Although we have only briefly discussed the pos- sibility of multiple pitch accents within a theme or rheme, we have included such a capability in our im- plementation. T h e system's ability to handle multi- ple pitch accents is illustrated by the following ex- ample. (38) Q: I know that students USE WODGETS,. but which people DESIGN WIDGETS? H* H* LL%. A: ENGINEERS design widgets. H* L. While m a n y i m p o r t a n t problems remain, exam- ples like these show t h a t it is possible to produce synthesized speech with contextually appropriate in- tonational contours using a combinatory theory of prosody and information structure t h a t is completely transparent to syntax and semantics. T h e model of utterance generation for Combinatory Categorial G r a m m a r s presented here implements the prosodic theory in a similarly transparent and straightforward manner.. 8 R e f e r e n c e s . [1] Beckman, Mary and J a n e t Pierrehumbert: 1986, 'Intonational Structure in Japanese and English', Phonology Yearbook, 3, 255-310.. [2] Bird, Steven: 1991, 'Focus and phrasing in Uni- fication Categorial G r a m m a r ' , in Steven Bird (ed.), Declarative Perspectives on Phonology, Working Papers in Cognitive Science 7, Univer- sity of Edinburgh. 139-166.. [3] Gerdeman, Dale and Erhard Hinrichs: 1990. Functor-driven Natural Language Generation with Categorial Unification Grammars. Pro- ceedings of COLING go, Helsinki, 145-150.. [4] Jackendoff, Ray: 1972, Semantic Interpretation in Generative Grammar, MIT Press, Cambridge MA.. [5] Liberman, Mark and A.L. Buchsbaum: 1985, 'Structure and Usage of Current Bell Labs Text to Speech Programs', Technical M e m o r a n d u m , T M 11225-850731-11, A T & T Bell Laboratories.. [6] Moortgat, Michael: 1989, Categorial Investiga- tions, Foris, Dordreeht.. [7] Pareschi, R e m o and Mark Steedman: 1987, 'A Lazy Way to Chart-parse with Categorial G r a m m a r s ' , Proceedings of the ~5th Annual Meeting of the Association for Computational Linguistics, Stanford CA, July 1987, 81-88.. [8] Pierrehumbert, Janet: 1980, The Phonology and Phonetics of English Intonation, Ph.D disserta- tion, MIT. (Dist. by Indiana University Lin- guistics Club, Bloomington, IN.). [9] Pierrehumbert, Janet, and Julia Hirschberg, 1990, ' T h e Meaning of Intonational Contours in the Interpretation of Discourse', in Philip Co- hen, Jerry Morgan, and Martha Pollack (eds.), Intentions in Communication, MIT Press Cam- bridge MA, 271-312.. [10] Shieber, Stuart and Yves Schabes: 1991, 'Gen- eration and Synchronous Tree-Adjoining Gram- mars', Computational Intelligence, 4, 220-228.. [11] Steedman, Mark: 1990. ' G a p p i n g as Con- stituent Coordination', Linguistics ~J Philoso- phy, 13, 207-263.. [12] Steedman, Mark: 1990, 'Structure and In- tonation in Spoken Language Understanding', Proceedings of the 25th Annual Conference of the Association for Computational Linguistics, Pittsburgh, PA, June 1990, 9-17.. [13] Steedman, Mark: 1991, Structure and Intona- tion, Language, 68, 260-296.. [14] Steedman, Mark: 1991, 'Type-raising and Di- rectionality in Categorial G r a m m a r ' , Proceed- ings of the 29th Annual Meeting of the Asso- ciation for Computational Linguistics, Berkeley CA, June 1991, 71-78.. 339. [15] Steedman, Mark: 1991, 'Surface Structure, In- tonation, and "Focus"', in Ewan Klein and F. Veltman (eds.), Nalural Language and Speech, Proceedings of the E S P R I T Symposium, Brus- sels, Nov. 1991. 21-38,260-296.. 340

New documents

Research Piroska Lendvai University of Göttingen Wuwei Lan Ohio State University Jessy Li UPenn / UT Austin Sujian Li Peking University Jiwei Li Stanford University Chen Li University

Besides, there was a statistically significant difference in the distribution of Legionella serogroups for both the patients and the asymptomatic seropositive persons between southern

The observation indexes included the general conditions of the patients age, size of the uter- us, and operation indications, estimated intra- operative blood loss, operating time from

Figure 3: We use a neural network comprised of i an LSTM processing encoded parser output and document text character-by-character labeled LSTM, green, ii a fully-connected layer FC,

Abstract: Objective: This study attempted to graft neural precursor cells NPCs differentiated from mouse induced pluripotent stem cells iPSc with gelatin sponge-electrospun Poly

3.2.1 Role and Relation Properties Spatial Roles are applied on phrases and most of the features are used based on the previous works Kordjamshidi and Moens, 2015, however the previous

However, by building the dependency rela- tion directly into the structure of SA-LSTM and changing the way information flows, SA-LSTM is able to model the whole tree structure of depen-

Abstract: To evaluate the clinical efficacy and security of combined negative pressure wound therapy NPWT and percutaneous angioplasty PTA in the treatment of diabetic foot ulcers with

Here, we describe experimental results Table 1 comparing several algorithms: line B The ban- dit variant of the LOLS algorithm, which uses importance sampling and �-greedy exploration;

Therefore, to a better understanding of the associations between the VEGF -460T/C and +405G/C polymorphisms and psoriasis risk, we conducted a hospital- based case-control study in a