Japanese Discourse and the Process of Centering

40 

Japanese Discourse and the Process of Centering. M a r i l y n W a l k e r * University of Pennsylvania. S h a r o n C o t e s University of Pennsylvania. M a s a y o I i d a t Stanford University. This paper has three aims: (1) to generalize a computational account of the discourse process called CENTERING, (2) to apply this account to discourse processing in Japanese so that it can be used in computational systems for machine translation or language understanding, and (3) to provide some insights on the effect of syntactic factors in Japanese on discourse interpretation. We argue that while discourse interpretation is an inferential process, syntactic cues constrain this process; we demonstrate this argument with respect to the interpretation of ZEROS, unexpressed arguments of the verb, in Japanese. The syntactic cues in Japanese discourse that we investigate are the morphological markers for grammatical TOPIC, the postposition wa, as well as those for grammatical functions such as SUBJECT, ga, OBJECT, o and OBJECT2, ni. In addition, we investigate the role of speaker's EMPATHY, which is the viewpoint from which an event is described. This is syntactically indicated through the use of verbal compounding, i.e. the auxiliary use of verbs such as kureta, kita. O u r results are based on a survey of native speakers of their interpretation of short discourses, consisting of minimal pairs, varied by one of the above factors. We demonstrate that these syntactic cues do indeed affect the interpretation of ZEROS, but that having previously been the TOPIC and being realized as a ZERO also contributes to the salience of a discourse entity. We propose a discourse rule of ZERO TOPIC ASSIGNMENT, and show that CENTERING provides constraints on when a ZERO can be interpreted as the ZERO TOPIC.. 1. Introduction. 1.1 Centering in Japanese D i s c o u r s e Recently there has been an increasing a m o u n t of w o r k in computational linguistics involving the interpretation of anaphoric elements in Japanese (Yoshimoto 1988; Kuno 1989; Walker, Iida, a n d Cote 1990; N a k a g a w a 1992). These accounts are intended as components of computational systems for machine translation b e t w e e n Japanese a n d English or for natural language processing in Japanese alone. This paper has three aims: (1) to generalize a computational account of the discourse process called CEN- TERING (Sidner 1979; Joshi a n d Weinstein 1981; Grosz, Joshi, and Weinstein 1983; Grosz, Joshi, a n d Weinstein unpublished), (2) to apply this account to discourse processing. * University of Pennsylvania, Computer Science Department, Philadelphia PA 19104. E-mail: lyn@linc.cis.upenn.edu.. t CSLI, Stanford University, Stanford CA 94305. E-maih iida@csli.stanford.edu. :~ University of Pennsylvania, Linguistics Department, Philadelphia PA 19104. E-mail:. cote@linc.cis.upenn.edu.. @ 1994 Association for Computational Linguistics. Computational Linguistics Volume 20, Number 2. in Japanese so that it can be used in computational systems, and (3) to provide some insights on the effect of syntactic factors in Japanese on discourse interpretation.. In the computational literature, there are two foci for research on the interpretation of anaphoric elements such as pronouns. The first viewpoint focuses on an inferential process driven by the u n d e r l y i n g semantics and relations in the d o m a i n (Hobbs 1985a; Hobbs et al. 1987; Hobbs and Martin 1987). A polar focus is to concentrate on the role of syntactic information such as w h a t was previously the topic or subject (Hobbs 1976b; K a m e y a m a 1985; Yoshimoto 1988). We will argue for an intermediate position with respect to the interpretation of ZEROS, unexpressed arguments of the verb, in Japanese. Our position is that the interpretation of zeros is an inferential process, but that syntactic information provides constraints on this inferential process (Joshi a n d K u h n 1979; Joshi and Weinstein 1981). We will argue that syntactic cues and semantic interpretation are m u t u a l l y constraining (Prince 1981b, 1985; H u d s o n - D ' Z m u r a 1988).. The syntactic cues in Japanese discourse that we investigate are the morphological markers for grammatical TOPIC, the postposition wa, as well as those for grammatical functions such as SUBJECT, ga, OBJECT, o, and OBJECT2, ni. In addition, we investigate the role of speaker's EMPATHY, which is the viewpoint from which an event is described. This can be syntactically indicated t h r o u g h the use of verbal compounding, i.e. the auxiliary use of verbs such as kureta, kita.. In addition to the a r g u m e n t that a purely inference-based account does not con- sider limits on processing time, another a r g u m e n t against a purely inference-based account is provided by the minimal pair below. Here, the only difference is w h e t h e r Ziroo is the subject or the object in the second utterance. Note that the interpretation of zeros is indicated in parentheses:. Example 1 a. Taroo ga. b.. C.. kooen o sanpositeimasita. Taroo SUBJ park in walking-was Taroo was taking a walk in the park.. Ziroo ga 0 h u n s u i no mae de mitukemasita. Ziroo SUBJ OBJ fountain of front in f o u n d Ziroo found (Taroo) in front of the fountain.. 0 0 kinoo no siai no kekka o kikimasita. SUBJ OBJ yesterday of game of scores OBJ asked (Ziroo) asked (Taroo) the score of yesterday's game.. Example 2 a. Taroo ga. b.. C.. kooen o sanpositeimasita. Taroo SUBJ park in walking-was Taroo was taking a walk in the park.. 0 Ziroo o h u n s u i no mae de mitukemasita. SUBJ Ziroo OBJ fountain of front in f o u n d (Taroo) found Ziroo in front of the fountain.. 0 0 kinoo no siai no kekka o kikimasita. SUBJ OBJ yesterday of game of scores obj asked (Taroo) asked (Ziroo) the score of yesterday's game.. In l b and 2b, the syntactic position in which Ziroo is realized has the effect that lc. 194. Marilyn Walker et al. Japanese Discourse. means Ziroo asked Taroo the score of yesterday's game, while 2c means Taroo asked Ziroo the score of yesterday's game. On the other hand, some purely syntactic accounts require that antecedents for zeros be realized as the grammatical TOPIC, and thus cannot explain the above example because Taroo is never explicitly marked as the topic (Yoshimoto 1988).. In the literature, ZEROS are k n o w n as zero pronouns. We adopt the assumption of earlier work that the interpretation of zeros in Japanese is analogous to the interpre- tation of overt pronouns in other languages (Kuroda 1965; Martin 1976; K a m e y a m a 1985). Japanese also has overt pronouns, but the use of the overt p r o n o u n is rare in nor- mal speech, and is limited even in written text. This is mainly because overt pronouns like kare ('he') and kanozyo ('she') were introduced into Japanese in order to translate gender-insistent pronouns in foreign languages (Martin 1976). In this paper, we only consider zeros in subcategorized-for argument positions. Since Japanese doesn't have subject or object verb agreement, there is no syntactic indication that a zero is present in an utterance other than information from subcategorization). First, in Section 1.2 we describe the m e t h o d o l o g y that we applied in this investi- gation. In Section 2, we present the theory of centering and some illustrative exam- ples. Then, in Section 3, we discuss particular aspects of Japanese discourse context, namely grammatical TOPIC and speaker's EMPATHY. We will show h o w these can easily be incorporated into a centering account of Japanese discourse processing, and give a n u m b e r of examples to illustrate the predictions of the theory. We also discuss the w a y in which a discourse center is instantiated in Section 4.. In Section 5 we propose a discourse rule of ZERO TOPIC A S S I G N M E N T , and use the centering model to formalize constraints on w h e n a zero m a y be interpreted as a ZERO TOPIC. Our account makes a distinction between two notions of TOPIC--grammatical topic and zero topic. The grammatical topic is the wa-marked entity, which is by default predicted to be the most salient discourse entity in the following discourse. However, there are cases in which it m a y not be, depending on whether ZERO TOPIC ASSIGNMENT applies. This analysis provides support for Shibatani's claim that the interpretation of the topic marker, wa, depends on the discourse context (Shibatani 1990). ZERO T O P I C ASSIGNMENT actually predicts ambiguities in Japanese discourse interpretation and provides a mechanism for deriving interpretations that previous accounts claim w o u l d be unavailable.. We delay the review of related research to Section 6 w h e n we can contrast it with our account. The two major previous accounts are those of Kuno (Kuno 1972, 1976b, 1987, 1989) and K a m e y a m a (Kameyama 1985, 1986, 1988). Finally, in Section 7, we summarize our results and suggest topics for future research.. 1.2 Methodology Most of the examples in this paper are constructed as four utterance discourses that fit one of a n u m b e r of structural paradigms. In all of the paradigms, a discourse entity is. 1 W h e n z e r o p r o n o u n s s h o u l d b e s t i p u l a t e d is still a r e s e a r c h i s s u e . F o r e x a m p l e , H a s e g a w a (1984) d e s c r i b e d a z e r o p r o n o u n a s a p h o n e t i c a l l y n u l l e l e m e n t i n a n a r g u m e n t p o s i t i o n . H o w e v e r , a s s h o w n i n t h e f o l l o w i n g e x a m p l e , T e r a z u , Y a m a n a s i , a n d I n a d a (1980) a s s u m e d t h a t z e r o p r o n o u n s a r e n o t l i m i t e d i n t h e i r d i s t r i b u t i o n a n d s t i p u l a t e d t h e m i n a d j u n c t p o s i t i o n s a s w e l l (Iida 1993).. Taroo w a H a n a k o n o k a b a n o m i t u k e m a s i t a . Taroo TOP/SUBJ H a n a k o GEN b a g OBJ f o u n d Taroo found Hanako" s bag. 0 0 t a n z y o o b i n o p u r e z e n t o o irernasita.. b i r t h d a y GEN p r e s e n t OBJ p u t (Taroo) put a birthday present (in her bag).. 195. Computational Linguistics Volume 20, Number 2. introduced in the first utterance, and established by the second utterance as the CENTER, w h a t the discourse is about. The manipulations of context occur with the third and the fourth utterances. In each case the zero in the third utterance cospecifies the entity already established as the center in the second utterance. The fourth utterance consists of a potentially ambiguous sentence containing two zeros. The variations in context are as shown below:. Third Utterance Fourth Utterance SUBJECT OBJECT(2) SUBJECT OBJECT(2) EXAMPLES. zero NP(o or ni) zero zero 5. zero NP(o or ni) zero zero, e m p a t h y 36. NP(ga) zero zero zero 32, 34. NP(wa) zero zero zero 4, 33. NP(ga) zero zero zero, e m p a t h y 35. Thus we are manipulating factors such as whether a discourse entity is realized in subject or object position in the third utterance, whether a discourse entity realized in subject position is ga-marked or wa-marked in the third utterance, and whether a discourse entity realized in the fourth utterance in object position is m a r k e d as the locus of speaker's EMPATHY.. We collected a group of about 35 native speakers by solicitation on the InterNet to provide j u d g m e n t s for most of the examples given in this paper. These native speakers were readers of the newsgroups sci.lang.japanese and comp.research.japan. They were thus typically well-educated, bilingual engineers. Whenever an example was tested in this way, we provide the n u m b e r of informants w h o chose each possible interpretation to the right of the example. Some examples that are included for expository reasons were not tested.. Participation in our survey was completely voluntary, and the data were collected over three surveys. Thus the numbers of subjects varied from one survey to another, and this is reflected in the n u m b e r s accompanying our examples. This data collection was carried out on written examples using electronic mail in a situation in which the informants could take as long as they w a n t e d to decide which interpretation they preferred. The instructions sent with the surveys are given in Appendix A.. This p a r a d i g m clearly cannot provide information on which interpretation a sub- ject might arrive at first and then perhaps change based on other pragmatic factors, and thus it contrasts with reaction time studies. However, the j u d g m e n t s given should be stable and should reflect the fact that our informants were able to use all the informa- tion in the discourse. It is a useful p a r a d i g m given that we are exploring the correlation of syntactic cues and discourse interpretation. It has been claimed that syntactic cues are only used in automatic processing and can be overridden by deeper processing. However, H u d s o n ' s results suggest that subjects m a y judge a discourse sequence to be nonsensical w h e n it is incoherent according to centering ( H u d s o n - D ' Z m u r a 1988). Di Eugenio claims that discourse sequences in Italian that are not discourse-coher- ent according to centering theory produce a garden-path effect (Di Eugenio 1990). The m e t h o d s we used allow us to explore the results of these interactions, and yet it. 196. Marilyn Walker et al. Japanese Discourse. w o u l d b e b e n e f i c i a l f o r t h e s e r e s u l t s to b e e x p a n d e d u p o n b y c a r e f u l p s y c h o l o g i c a l e x p e r i m e n t a t i o n . . F o r m o s t of t h e e x a m p l e s r e p o r t e d h e r e , w e a s k e d s u b j e c t s t o c h o o s e o n e p r e f e r r e d i n t e r p r e t a t i o n i n s t e a d o f a l l o w i n g t h e m to r a n k i n t e r p r e t a t i o n s . T h e m o t i v a t i o n f o r d o i n g t h i s w a s to f o r c e d i f f e r e n c e s to c o m e o u t f o r s l i g h t p r e f e r e n c e s , w i t h t h e t h e o r y b e i n g t h a t o t h e r v a r i a t i o n s w o u l d c o m e o u t a c r o s s subjects. I n a f e w c a s e s w e a l l o w e d s u b j e c t s t o i n d i c a t e n o p r e f e r e n c e ; t h e s e e x a m p l e s will b e c l e a r l y i n d i c a t e d . . I n a d d i t i o n , w e u s e d t h e s a m e g e n d e r f o r m u l t i p l e d i s c o u r s e e n t i t i e s t o p r e v e n t a n y t e n d e n c y f o r j u d g m e n t s t o b e i n f l u e n c e d b y g e n d e r s t e r e o t y p e s . We also a v o i d e d u s i n g v e r b s w i t h c a u s a l b i a s e s t o w a r d o n e o f t h e i r a r g u m e n t s , a n d w e u s e d f e w c u e w o r d s s u c h as but, because, a n d then, w h i c h c o u l d r e s u l t in a b i a s t o w a r d , say, a c a u s e - e f f e c t o r t e m p o r a l s e q u e n c e o f e v e n t s i n t e r p r e t a t i o n . We also o m i t t e d h o n o r i f i c m a r k e r s , w h i c h a r e n o r m a l l y a p a r t o f J a p a n e s e a m b i g u i t y r e s o l u t i o n . 2 T h i s w a s d o n e to i s o l a t e t h e effects o f t h e v a r i a b l e s t h a t w e w e r e e x p l o r i n g in this s t u d y , n a m e l y t o p i c m a r k i n g , g r a m m a t i c a l f u n c t i o n , e m p a t h y , a n d r e a l i z a t i o n w i t h a z e r o o r w i t h a full n o u n p h r a s e . . 2. Centering Theory. W i t h i n a t h e o r y o f d i s c o u r s e , CENTERING is a c o m p u t a t i o n a l m o d e l of t h e p r o c e s s b y w h i c h c o n v e r s a n t s c o o r d i n a t e a t t e n t i o n in d i s c o u r s e ( G r o s z , Joshi, a n d W e i n s t e i n u n - p u b l i s h e d ) . C e n t e r i n g h a s its c o m p u t a t i o n a l f o u n d a t i o n s i n t h e w o r k o f G r o s z a n d S i d n e r ( G r o s z 1977; S i d n e r 1979; G r o s z a n d S i d n e r 1986) a n d w a s f u r t h e r d e v e l o p e d b y G r o s z , Joshi, a n d W e i n s t e i n (1983, u n p u b l i s h e d ) a n d J o s h i a n d W e i n s t e i n (1981). C e n t e r i n g is i n t e n d e d to reflect a s p e c t s o f ATYENTIONAL STATE in a t r i p a r t i t e v i e w o f d i s c o u r s e s t r u c t u r e t h a t also i n c l u d e s INTENTIONAL STRUCTURE a n d LINGUISTIC STRUC- TURE ( G r o s z a n d S i d n e r 1986). I n G r o s z a n d S i d n e r ' s t h e o r y o f d i s c o u r s e s t r u c t u r e , d i s c o u r s e s c a n b e s e g m e n t e d b a s e d o n i n t e n t i o n a l s t r u c t u r e , a n d a d i s c o u r s e s e g m e n t e x h i b i t s b o t h local a n d g l o b a l c o h e r e n c e . G l o b a l c o h e r e n c e d e p e n d s o n h o w e a c h s e g - m e n t r e l a t e s t o t h e o v e r a l l p u r p o s e o f t h e d i s c o u r s e ; local c o h e r e n c e d e p e n d s o n as- p e c t s s u c h as t h e s y n t a c t i c s t r u c t u r e o f t h e u t t e r a n c e s in t h a t s e g m e n t , t h e c h o i c e o f r e f e r r i n g e x p r e s s i o n s , a n d t h e u s e o f ellipses. CENTERING m o d e l s local c o h e r e n c e a n d is f o r m a l i z e d as a s y s t e m of c o n s t r a i n t s a n d rules. O u r a n a l y s i s u s e s a n a d a p t a t i o n o f t h e C e n t e r i n g a l g o r i t h m t h a t w a s d e v e l o p e d b y B r e n n a n , F r i e d m a n , a n d P o l l a r d , b a s e d o n t h e s e c o n s t r a i n t s a n d r u l e s ( B r e n n a n , F r i e d m a n , a n d P o l l a r d 1987; W a l k e r 1989).. T h e p u r p o s e o f c e n t e r i n g as p a r t o f a c o m p u t a t i o n a l m o d e l o f d i s c o u r s e i n t e r p r e - t a t i o n is to m o d e l ATTENTIONAL STATE in d i s c o u r s e in o r d e r to c o n t r o l i n f e r e n c e (Joshi a n d K u h n 1979; Joshi a n d W e i n s t e i n 1981). 3 O u r a p p r o a c h to m o d e l i n g a t t e n t i o n a l s t a t e is to e x p l o r e a s p e c t s o f t h e c o r r e l a t i o n b e t w e e n s y n t a x a n d d i s c o u r s e f u n c t i o n . T h i s a s s u m e s t h a t t h e r e a r e l a n g u a g e c o n v e n t i o n s a b o u t d i s c o u r s e s a l i e n c e a n d t h a t c o n v e r s a n t s a t t e m p t to m a i n t a i n a s e n s e o f s h a r e d c o n t e x t . . 2 While native speakers understandably found some of these examples "stilted" or "awkward," they were still able to give their judgments based on the information that was provided in the discourses.. 3 Recent work in situation theory proposes to control computation with a similar notion of background information in terms of constants of the situation that thus are not explicitly realized in an utterance (Nakashima 1990). The situation-theoretic work does not as yet distinguish shared knowledge that determines discourse salience and derives from the discourse context and the way utterances are expressed (Clark and Haviland 1977; Clark and Marshall 1981; Prince 1981b) from shared knowledge that is part of general background knowledge such as cultural assumptions (Prince 1978a; Joshi 1982) or shared knowledge that might derive from the task context (Grosz 1977).. 197. Computational Linguistics Volume 20, Number 2. S e c t i o n 2.1 p r e s e n t s the c e n t e r i n g r u l e s a n d co n st rai n t s. Sections 2.2 a n d 2.3 illus- trate the t h e o r y a n d t h e d e f i n i t i o n s w i t h a n u m b e r o f e x a m p l e s . S ect i o n 2.4 d i scu sses the c e n t e r i n g a l g o r i t h m f o r t h e r e s o l u t i o n of z e r o s in Jap an ese.. 2.1 R u l e s a n d C o n s t r a i n t s T h e c e n t e r i n g m o d e l is v e r y simple. Each u t t e r a n c e i n a d i s c o u r s e s e g m e n t h a s t w o s t r u c t u r e s a s s o c i a t e d w i t h it. First, e a c h u t t e r a n c e in a d i s c o u r s e h a s a s s o c i a t e d w i t h it a set o f d i s c o u r s e entities called FORWARD-LOOKING CENTERS, Cf. C e n t e r s are s e m a n t i c entities t h a t are p a r t of t h e d i s c o u r s e m o d e l . S e c o n d , t h e r e is a special m e m b e r o f this set called the BACKWARD-LOOKING CENTER, Cb. T h e Cb is t h e d i s c o u r s e e n t i t y t h a t t h e u t t e r a n c e m o s t c e n t r a l l y c o n c e r n s , w h a t h a s b e e n e l s e w h e r e called t h e ' t h e m e ' ( R e i n h a r t 1981; H o r n 1986). T h e Cb e n t i t y links t h e c u r r e n t u t t e r a n c e t o t h e p r e v i o u s d i s c o u r s e . . T h e set o f FORWARD-LOOKING CENTERS, Cf, is r a n k e d a c c o r d i n g to d i s c o u r s e salience. We will discuss f a c t o r s t h a t d e t e r m i n e the r a n k i n g below. T h e h i g h e s t - r a n k e d m e m b e r o f t h e set o f f o r w a r d - l o o k i n g c e n t e r s is r e f e r r e d t o as t h e PREFERRED CENTER, Cp. 4 T h e PREFERRED CENTER r e p r e s e n t s a p r e d i c t i o n a b o u t t h e Cb o f t h e f o l l o w i n g u t t e r a n c e . S o m e t i m e s the C p will b e w h a t t h e p r e v i o u s s e g m e n t o f d i s c o u r s e w a s a b o u t , t h e Cb, b u t this is n o t n e c e s s a r i l y t h e case. This d i s t i n c t i o n b e t w e e n l o o k i n g b a c k t o t h e p r e v i - o u s d i s c o u r s e w i t h the Cb a n d p r o j e c t i n g p r e f e r e n c e s f o r i n t e r p r e t a t i o n in s u b s e q u e n t d i s c o u r s e w i t h t h e C p is a k e y a s p e c t of c e n t e r i n g t h eo ry .. In a d d i t i o n to the s t r u c t u r e s f o r c e n t e r s , Cb a n d Cf, t h e t h e o r y o f c e n t e r i n g specifies a set of r u l e s a n d constraints. C o n s t r a i n t s are m e a n t to h o l d strictly w h e r e a s r u l e s m a y s o m e t i m e s b e v i o l a t e d . . • C O N S T R A I N T S F o r e a c h u t t e r a n c e Ui in a d i s c o u r s e s e g m e n t U I ~ . . . ~ Um:. .. 2.. .. T h e r e is p r e c i s e l y o n e b a c k w a r d - l o o k i n g c e n t e r Cb. E v e r y e l e m e n t of t h e f o r w a r d c e n t e r s list, Cf(Ui), m u s t b e r e a l i z e d in Ui. T h e center, Cb(Ui), is t h e h i g h e s t - r a n k e d e l e m e n t o f C f ( U i - D t h a t is r e a l i z e d in Ui. 5. C o n s t r a i n t (1) s a y s t h a t t h e r e is o n e c e n t r a l d i s c o u r s e e n t i t y t h a t t h e u t t e r a n c e is a b o u t , a n d t h a t is t h e Cb. T h e s e c o n d c o n s t r a i n t d e p e n d s o n t h e d e f i n i t i o n o f realizes. A n u t t e r a n c e U realizes a c e n t e r c if c is a n e l e m e n t o f t h e s i t u a t i o n d e s c r i b e d b y U, o r c is t h e s e m a n t i c i n t e r p r e t a t i o n of s o m e s u b p a r t o f U (Grosz, Joshi, a n d W e i n s t e i n u n p u b l i s h e d ) . T h u s t h e r e l a t i o n REALIZE d e s c r i b e s zero s, e x p l i c i t l y r e a l i z e d d i s c o u r s e entities, a n d t h o s e i m p l i c i t l y r e a l i z e d c e n t e r s t h a t are en t i t i es i n f e r a b l e f r o m t h e dis- c o u r s e s i t u a t i o n (Prince 1978a, 1981b).. A s p e c i a l i z a t i o n o f t h e r e l a t i o n REALIZE is t h e r e l a t i o n DIRECTLY REALIZE. A c e n t e r is d i r e c t l y r e a l i z e d if it c o r r e s p o n d s to a p h r a s e in a n u t t e r a n c e . We rest ri ct o u r f o c u s to entities r e a l i z e d b y n o u n p h r a s e s ; h o w e v e r , it is clear t h a t p r o p o s i t i o n s c a n b e c e n t e r s , s o w e a s s u m e t h a t the a c c o u n t g i v e n h e r e c a n b e e x t e n d e d t o p r o p o s i t i o n a l en t i t i es as w e l l ( W e b b e r 1978; S i d n e r 1979; P r i n c e 1986, 1978b; W a r d 1985).. 4 The notion of PREFERRED CENTER corresponds to Sidner's notion of EXPECTED FOCUS (Sidner 1983). 5 This could possibly be rephrased as: Assume the Cp(Ui-1 is the Cb(Ui) unless there is evidence to the. contrary (Carter 1987).. 198. Marilyn Walker et al. Japanese Discourse. A s w e d i s c u s s f u r t h e r in S e c t i o n 3, z e r o s r e f e r to e n t i t i e s t h a t a r e a l r e a d y in t h e d i s c o u r s e c o n t e x t . T h e fact t h a t t h e c u r r e n t u t t e r a n c e REALIZES o n e o r m o r e z e r o s f o l l o w s f r o m i n f o r m a t i o n s p e c i f i e d in t h e s u b c a t e g o r i z a t i o n f r a m e o f t h e v e r b . T h e s e a r g u m e n t s m u s t b e i n t e r p r e t e d a n d t h u s a c q u i r e a d e g r e e o f d i s c o u r s e s a l i e n c e t h a t n o n s u b c a t e g o r i z e d - f o r d i s c o u r s e e n t i t i e s lack.. C o n s t r a i n t (3) s t i p u l a t e s t h a t t h e r a n k i n g o f t h e f o r w a r d c e n t e r s , Cf, d e t e r m i n e s f r o m a m o n g t h e e l e m e n t s t h a t a r e r e a l i z e d in t h e n e x t u t t e r a n c e , w h i c h o f t h e m w i l l b e t h e C b f o r t h a t u t t e r a n c e . If t h e PREFERRED CENTER, C p ( U i ) , is r e a l i z e d in Ui+l, it is p r e d i c t e d to b e t h e C b ( U i + l ) . W e will u s e t h e f o l l o w i n g f o r w a r d c e n t e r r a n k i n g f o r J a p a n e s e : 6. (GRAMMATICAL O R ZERO) TOPIC > EMPATHY > SUBJECT > OBJECT2 > OBJECT > OTHERS. B a c k w a r d - l o o k i n g c e n t e r s , C b s , a r e o f t e n d e l e t e d o r p r o n o m i n a l i z e d a n d s o m e t r a n s i t i o n s b e t w e e n d i s c o u r s e s e g m e n t s a r e m o r e c o h e r e n t t h a n o t h e r s . A c c o r d i n g to t h e t h e o r y o f c e n t e r i n g , c o h e r e n c e is m e a s u r e d b y t h e h e a r e r ' s i n f e r e n c e l o a d w h e n i n t e r p r e t i n g a d i s c o u r s e s e q u e n c e (Joshi a n d W e i n s t e i n 1981; G r o s z , Joshi, a n d Wein- s t e i n u n p u b l i s h e d ) . F o r i n s t a n c e , d i s c o u r s e s e g m e n t s t h a t c o n t i n u e c e n t e r i n g t h e s a m e e n t i t y a r e m o r e c o h e r e n t t h a n t h o s e t h a t r e p e a t e d l y s h i f t f r o m o n e c e n t e r to a n o t h e r . T h e s e o b s e r v a t i o n s a r e e n c a p s u l a t e d i n t w o rules:. .. 2.. RULES F o r e a c h Ui in a d i s c o u r s e s e g m e n t U 1 , . . . , Urn:. If s o m e e l e m e n t o f C f ( U i - l ) is r e a l i z e d as a p r o n o u n in Ui, t h e n s o is C b ( U i ) . T r a n s i t i o n s t a t e s a r e o r d e r e d . CONTINUE is p r e f e r r e d to RETAIN is p r e f e r r e d to SMOOTH-SHIFT i s p r e f e r r e d to ROUGH-SHIFT. 7. R u l e (1) c a p t u r e s t h e i n t u i t i o n t h a t p r o n o m i n a l i z a t i o n is o n e w a y to i n d i c a t e d i s - c o u r s e salience. I t f o l l o w s f r o m R u l e (1) t h a t if t h e r e a r e m u l t i p l e p r o n o u n s in a n u t t e r a n c e , o n e o f t h e s e m u s t b e t h e Cb. I n a d d i t i o n , if t h e r e is o n l y o n e p r o n o u n , t h e n t h a t p r o n o u n m u s t b e t h e Cb. F o r J a p a n e s e , w e e x t e n d this r u l e d i r e c t l y to z e r o s , a s s u m i n g t h a t z e r o s in J a p a n e s e c o r r e s p o n d to d e s t r e s s e d p r o n o u n s in English.. R u l e (2) s t a t e s t h a t m o d e l i n g a t t e n t i o n a l s t a t e d e p e n d s o n a n a l y z i n g a d j a c e n t u t - t e r a n c e s a c c o r d i n g to a set o f t r a n s i t i o n s t h a t m e a s u r e t h e c o h e r e n c e o f t h e d i s c o u r s e s e g m e n t in w h i c h t h e u t t e r a n c e occurs. M e a s u r i n g c o h e r e n c e is b a s e d o n a n e s t i m a t e o f t h e h e a r e r ' s i n f e r e n c e l o a d , b u t t h i s m e a s u r e m u s t a l w a y s b e r e l a t i v e since t h e r e is n o g r a m m a r o f d i s c o u r s e . T h u s m e t h o d s f o r e x p l o r i n g t h e s e i s s u e s m u s t u s e c o m p a r - a t i v e m e a s u r e s o f h o w s o m e d i s c o u r s e s a r e e a s i e r to p r o c e s s t h a n o t h e r s . C e n t e r i n g t h e o r y m o d e l s this b y s t i p u l a t i n g t h a t s o m e t r a n s i t i o n s a r e p r e f e r r e d o v e r o t h e r s . . T h e t y p o l o g y o f t r a n s i t i o n s f r o m o n e u t t e r a n c e , Ui, to t h e n e x t is b a s e d o n t w o factors: w h e t h e r t h e b a c k w a r d - l o o k i n g center, Cb, is t h e s a m e f r o m U i - 1 to Ui, a n d w h e t h e r this d i s c o u r s e e n t i t y is t h e s a m e as t h e p r e f e r r e d center, C p , o f Ui. 8. 6 This ranking is consistent with Kuno's Empathy Hierarchies and with Kameyama's Expected Center Order (Kuno 1987; Kameyama 1985, 1988). This will be discussed in Section 6. We do not include discourse entities for verb phrases or other propositional entities in this ranking since we have not studied their contribution (but see Sidner 1979, 1981 and Carter 1987).. 7 Smooth-shift was called shifting-1 by Brennan, Friedman, and Pollard (1987). 8 It is possible that restricting the relation between the Cb(Ui) and the Cb(Ui_l) to be coreference. (equality) may be too strong. Future work should examine the role of shifts to functionally dependent entities or entities related by partially ordered set (POSET) relations to the previous Cb.. 199. Computational Linguistics Volume 20, N u m b e r 2. C b ( U i ) = C b ( U i - 1 ) O R C b ( U i - 1 ) = [?1. C b ( U i ) # C b ( U i - 1 ) . C b ( U i ) = C p ( U i ) C O N T I N U E S M O O T H - S H I F T . C b ( U i ) • C p ( U i ) R E T A I N R O U G H - S H I F T Figure 1 Centering transition states, rule 2.. KEY BACKWARD-LOOKING CENTER = C b . PREFERRED CENTER = C p U n i n s t a n t i a t e d C b = [?]. .. 2.. C b ( U i ) = C b ( U i _ l ) , o r t h e r e is n o C b ( U i - 1 ) . C b ( U i ) = C p ( U i ) . If b o t h (1) a n d (2) h o l d t h e n w e a r e in a CONTINUE t r a n s i t i o n . T h e C O N T I N U E t r a n s i t i o n c o r r e s p o n d s to c a s e s w h e r e t h e s p e a k e r h a s b e e n t a l k i n g a b o u t a p a r t i c u l a r e n t i t y a n d i n d i c a t e s a n i n t e n t i o n to c o n t i n u e t a l k i n g a b o u t t h a t entity. 9 I f (1) h o l d s b u t (2) d o e s n ' t h o l d t h e n w e a r e in a RETAIN t r a n s i t i o n . RETAIN c o r r e s p o n d s to a s i t u a t i o n w h e r e t h e s p e a k e r is i n t e n d i n g to SHIFT o n t o a n e w e n t i t y in t h e n e x t u t t e r a n c e a n d is s i g n a l l i n g t h i s b y r e a l i z i n g t h e c u r r e n t c e n t e r i n a l o w e r r a n k e d p o s i t i o n o n t h e C f ( e x a m p l e s f o l l o w b e l o w ) . . If (1) d o e s n ' t h o l d t h e n w e a r e i n o n e o f t h e S H I F T s t a t e s d e p e n d i n g o n w h e t h e r o r n o t (2) h o l d s . T h i s d e f i n i t i o n o f t r a n s i t i o n s t a t e s is s u m m a r i z e d i n F i g u r e 1 ( B r e n n a n , F r i e d m a n , a n d P o l l a r d 1987). W e w i l l u s e t h e n o t a t i o n o f C b ( U i - 1 ) = [?] f o r c a s e s w h e r e t h e r e is n o C b ( U i - 1 ) . S e c t i o n 4 w i l l d i s c u s s c e n t e r i n s t a n t i a t i o n . . T h e c o m b i n a t i o n o f t h e c o n s t r a i n t s , r u l e s , a n d t r a n s i t i o n s t a t e s m a k e s a s e t o f t e s t a b l e p r e d i c t i o n s a b o u t w h i c h i n t e r p r e t a t i o n s h e a r e r s w i l l p r e f e r b e c a u s e t h e y re- q u i r e less p r o c e s s i n g . F o r e x a m p l e , m a x i m a l l y c o h e r e n t s e g m e n t s a r e t h o s e t h a t r e q u i r e less p r o c e s s i n g t i m e . A s e q u e n c e o f a CONTINUE f o l l o w e d b y a n o t h e r C O N T I N U E s h o u l d o n l y r e q u i r e t h e h e a r e r to k e e p t r a c k o f o n e m a i n d i s c o u r s e entity, w h i c h is c u r r e n t l y b o t h t h e C b a n d t h e C p . A s i n g l e p r o n o u n i n a n u t t e r a n c e is t h e c u r r e n t C b ( b y R u l e 1) a n d c a n b e i n t e r p r e t e d t o c o s p e c i f y t h e d i s c o u r s e e n t i t y r e a l i z e d b y C p ( U i - 1 ) i n o n e s t e p ( C o n s t r a i n t 3).. T h e o r d e r i n g o f t h e C f is t h e m a i n d e t e r m i n a n t o f w h i c h t r a n s i t i o n s t a t e h o l d s b e t w e e n a d j a c e n t u t t e r a n c e s . T h i s m e a n s t h a t t h e p r e d i c t i o n s o f t h e t h e o r y a r e l a r g e l y d e t e r m i n e d b y t h e r a n k i n g o f t h e i t e m s o n t h e Cf. B u t t h e r e a r e m a n y f a c t o r s t h a t c a n c o n t r i b u t e to t h e s a l i e n c e o f a d i s c o u r s e e n t i t y ; a m o n g t h e m a r e f a c t o r s t h a t w e w i l l n o t e x a m i n e h e r e s u c h a s lexical s e m a n t i c s , i n t o n a t i o n , w o r d - o r d e r , a n d tense. 1° I n this. 9 A prediction made by the preference for CONTINUE is that intersentential antecedents for pronouns will be preferred over intrasentential candidates. This preference is one that distinguishes Centering for pronoun interpretation from the proposal made by Hobbs (1976a, 1976b). However, this preference needs to be constrained further by the fact that sortal filters may rule out the Cp of the previous utterance as the current Cb. In this case the data suggest that perhaps intrasentential candidates should be preferred (Walker 1989). Carter explored this in his extension of Sidner's theory of local focusing (Carter 1987).. 10 See Hudson-D'Zmura (1988) for an examination of the role of lexical semantics in centering.. 200. Marilyn Walker et al. Japanese Discourse. paper we explore the influence of various syntactic factors, which we discuss in detail in Section 3. We will also examine the relative contribution of pronominalization and postposition marking in Section 5. We postulate that the Cf ordering will vary from language to language d e p e n d i n g on the means the language provides for expressing discourse function. However m u c h of this variation can be captured in the ranking of the Cf due to the m o d u l a r i t y of the theory.. In Sections 2.2 and 2.3 we will present some simple examples to motivate these definitions. In Section 2.4 we will present a slightly modified version of the centering algorithm (Brennan, Friedman, and Pollard 1987). In the following discussion we as- sume that the centering rules and constraints and the notion of centering transition states have some cognitive reality (Brennan submitted; H u d s o n - D ' Z m u r a 1988; Gor- don, Grosz, and Gilliom 1993; H u d s o n - D ' Z m u r a and Tanenhaus, 1995). However, we make no claims about the cognitive reality of the centering algorithm that we discuss in Section 2.4.. 2.2 The D i s t i n c t i o n b e t w e e n C o n t i n u e and Retain This theory predicts preferences in the interpretation of utterances whose meaning depends on parameters from the discourse context. Thus if there are still multiple possibilities for interpretation after the application of all constraints and rules, the ordering on transitions applies, and CONTINUE interpretations are preferred (Rule 2). Indeed, m a n y cases of the preference for one interpretation over another follow directly from the distinction between the transition states of CONTINUE and RETAIN. Let us look at a simple example. In the discourse segment in 3, the zero in the second sentence is u n d e r s t o o d as referring to Taroo, and not to Hanako. Remember that the interpretation of zeros is indicated with parentheses.. Example 3 a. Taroo wa. b.. Hanako o eiga ni sasoimasita. Taroo TOP/SUBJ Hanako OBJ movie to invited Taroo invited Hanako to the movie.. Cb: TAROO Cf: [TAROO, HANAKO]. 0 itiniti-zyuu nani mo te ni tukimasendesita. SUBJ all-day anything even h a n d to attached-not (Taroo) could not do anything all day.. Cb: TAROO Cf: [TAROO]. In example 3, the Cf from 3a contains the discourse entity for Taroo as the first element a n d for Hanako as the second element. When the unexpressed a r g u m e n t is interpreted in 3b, the information from this Cf is used. Because the zero subject m a y REALIZE either Taroo or Hanako, both Constraint 3 and Rule 1 w o u l d be obeyed with either interpretation. 11 However by interpreting the zero as Taroo, Taroo is the Cb, a n d it is possible to get a preferred CONTINUE interpretation Taroo could not do anything all day. In this interpretation, Taroo is both the Cb(3b) and the Cp(3b).. 11 The hypothesis that wa in 3a instantiates Taroo as the Cb will be discussed in Section 4.. 201. Computational Linguistics Volume 20, Number 2. 2.3 The Distinction b e t w e e n Smooth-Shift and Rough-Shift In example 4, we illustrate the difference between the transition states of ROUGH- SHIFT a n d SMOOTH-SHIFT. Remember that ROUGH-SHIFT is claimed to be less coherent than SMOOTH-SHIFT (Brennan, Friedman, and Pollard 1987). In both cases the speaker has shifted the center to a different discourse entity. However, in the SMOOTH-SHIFT transition state, the speaker has indicated an intention to continue talking about the recently shifted-to entity by realizing that entity in a highly ranked Cf position such as subject, whereas no such indication is available with the ROUGH-SHIFT transition. The numbers s h o w n to the right of an interpretation correspond to h o w m a n y native speakers preferred that interpretation.. Example 4 a. Taroo ga kooen de h o n o yondeimasita.. Taroo SUBJ park at book OBJ reading-was Taroo was reading a book in the park.. Cb: [?] Cfl: [TAROO, BOOK]. SUBI OBJ. b. 0 koora o kai ni baiten ni hairimasita. SUBJ cola OBJ b u y to shop into entered (Taroo) entered a shop to buy a cola.. Cb: TAROO Cfl: [TAROO, COLA] CONTINUE. SUBJ OBJ. c. Ziroo wa 0 sokode g u u z e n dekuwasimasita. Z i r o o TOP/SUBJ OBJ there by chance met Ziroo met (Taroo) there by chance.. Cb: TAROO Cf: [ZIROO, TAROO] RETAIN. TOP OBJ. d. 0 0 eiga ni sasoimasita. SUBJ OBJ movie to invited. (Ziroo) invited (Taroo) to a movie.. Cb: ZIROO Cfl: [ZIROO, TAROO] SMOOTH-SHIFT 32. subj obj Cf2: [TAROO, ZIROO] ROUGH-SHIFT 2. SUBJ OBJ. In example 4, the use of TOPIC marking in the phrase Ziroo wa of utterance (c) means that (c) is interpreted as a RETAINJ 2 Ziroo becomes the most highly ranked discourse entity for c, although Taroo is the Cb since Taroo was most highly ranked for utterance (b) (by Constraint 3). Then w h e n we apply the Centering algorithm in (d), there are two candidates for the Cb(d) from the Cf(c), both Ziroo a n d Taroo. However, this time w h e n constraint 3 applies, stipulating that the Cb m u s t be the highest-ranked. 12 It has also been claimed that symmetric verbs such as meet by chance mark EMPATHY on the subject (Kuno 1976a).. 202. Marilyn Walker et al. Japanese Discourse. element of Cf(c) realized in 4d, Ziroo m u s t be the highest-ranked entity realized, a n d therefore m u s t be the Cb. At this point it is clear that some kind of SHIFT is forced by the application of constraint 3. The two candidates are a SMOOTH-SHIFT and a ROUGH- SHIFT. The SMOOTH-SHIFT interpretation corresponds to the reading Ziroo invited Taroo to a movie whereas the ROUGH-SHIFT interpretation corresponds to the Taroo invited Ziroo reading. The SMOOTH-SHIFT interpretation is more highly ranked, thus considered more coherent and so is the preferred interpretation (Z = 10.93, p < .001).. 2.4 The Centering Algorithm The CENTERING ALGORITHM that was proposed by Brennan, Friedman, and Pollard incorporates the centering rules and constraints in addition to contra-indexing con- straints on coreference (Reinhart 1976; Brennan, Friedman, and Pollard 1987; Iida 1993). These contra-indexing constraints specify that in a sentence such as He likes him, that he and him cannot co-specify the same discourse entity. The algorithm applies centering theory to the problem of resolving anaphoric reference. Application of the algorithm requires three basic steps.. .. 2.. .. GENERATE possible Cb-Cf combinations. FILTER by constraints, e.g. contra-indexing,• sortal predicates, centering rules and constraints. RANK by transition orderings. In order to apply this algorithm to Japanese, possible Cb-Cf combinations (GEN- ERATE step 1) m u s t be constructed from the surface string and information from the subcategorization frame of the verb. First the verb subcategorization is examined, and if there are more entities than appear in the surface string, zeros are postulated as forward centers. These zeros are then treated just like pronouns in English by the rest of the algorithm. We use a different ranking for the Cf for Japanese than for English, but this has no effect on the actual algorithm itself since the Cf ranking is a declarative parameter.. The steps of the algorithm given above can be interleaved to improve computa- tional efficiency. A simple implementation is to:. • Never propose a Cf that violates linguistic constraints on contra-indexing. (In other words, apply the contra-indexing filter as early as possible to avoid Cb-Cf combinations that will be eliminated by that filter,). • If there are pronouns in an utterance, only propose pronouns as possible Cbs. (Collect the pronouns from the proposed Cfs as Cbs, from Rule 1.). In addition, it is simple to a d d additional filters to step (2) of the algorithm. For instance, any constraint that is lexically specified such as [±animacy] can be easily applied as a filter. It is also possible to pursue a 'best first' strategy by interleaving steps (1), (2), and (3) so that a CONTINUE will be f o u n d w i t h o u t extra processing if one exists.. In example 5, we illustrate in more detail h o w the steps of the algorithm work and the difference between CONTINUE a n d RETAIN. Each utterance shows w h a t the Cb a n d Cf w o u l d be for that utterance. We will mostly be concerned with the process of resolving the two zeros in utterance 5c.. 203. Computational Linguistics Volume 20, Number 2. Example 5 a. Taroo wa. b.. saisin no k o n p y u u t a a o kaimasita. TOP/SUBJ newest of computer OBJ b o u g h t . Taroo bought a new computer.. Cb: TAROO Cf: [TAROO, COMPUTER]. 0 John ni sassoku sore o misemasita. SUBJ John OBJ2 at once that OBJ showed 'Taroo) showed it at once to John.. C.. Cb: TAROO Cf: [TAROO, JOHN, COMPUTER] CONTINUE. 0 0 atarasiku sonawatta kinoo o setumeisimasita. SUBJ OBJ2 newly equipped function OBJ explained (Taroo) explained the newly equipped functions to (John).. Cb: TAROO Cfl: [TAROO, JOHN] CONTINUE 27. SUBJ OBJ Cf2: [JOHN, T A R O O ] RETAIN 1. SUBJ OBJ Cf3: [JOHN, JOHN] CONTRA-INDEX FILTER. SUBJ OBJ Cf4: [TAROO, TAROO] CONTRA-INDEX FILTER. SUBJ OBJ. Example 5c has explained as the main verb, which requires an animate subject and object2. Since there are two animate zeros in 5c, which are also contra-indexed by syntactic constraints, both Ziroo and Taroo m u s t be realized in 5c. Constraint (3) restricts the Cb to Taroo as the highest-ranked element from the Cf(Sb). The interpretive process m u s t also generate the possible candidates for the Cf. If no constraints applied, t h e n all four candidates s h o w n above as Cfl, Cf2, Cf3, a n d Cf4 w o u l d be possible. However, the contraindexing filter will rule out Cf3 a n d Cf4. As m e n t i o n e d above, there is no reason that these filters cannot be applied at the GENERATE phase rather than later on.. The only CONTINUE interpretation available, Taroo explained the newly equipped func- tions to John, corresponds to the forward centers Cfl. It is a CONTINUE interpretation because Cb(5c) = Cb(5b) a n d also Cb(5c) = Cp(5c). The RETAIN interpretation is less preferred and is defined by the fact that Cb(5c) = Cb(5b), but Cb(5c) ~ Cp(5c). This example supports the claim that a CONTINUE is preferred over a RETAIN(Z ~- 13.24, p < .001).. In order to find this preferred continue interpretation in a 'best first' fashion, Taroo as the Cp(Ui-1) w o u l d be tried first as the Cb(Ui), and as the interpretation for the subject. Contraindexing rules out Taroo as the object, so John w o u l d be tried next as the object.. In the next section, we examine further the application of centering to the inter- pretation of zeros in Japanese. We will examine the ranking of forward centers that we have adopted for Japanese and explain h o w this is partially determined b y the w a y the Japanese language allows a speaker to express discourse functions. We will. 204. Marilyn Walker et al. Japanese Discourse. also give some examples of the interpretation of zeros in cases involving Japanese discourse markers for TOPIC and EMPATHY.. 3. Centering in Japanese. The theory of centering is a formal specification that is intended to model attentional state and is defined by the rules and constraints given in Section 2.1. Attentional state in turn constrains the discourse participant's interpretation process; one aspect of attentional state is the notion of discourse salience. In the centering model, the ordering of the forward centers is an approximation of discourse salience. This in turn is the main determinant of discourse interpretation processes such as the resolution of zeros in Japanese. A crucial question then is w h a t discourse factors must be considered to determine the ordering of the forward centers, Cf, in Japanese discourse.. Being a subject has been shown to be an important factor for English; this is re- flected in a Cf ordering by grammatical function (Prince 1981b; Brennan, Friedman, and Pollard 1987; H u d s o n - D ' Z m u r a 1988; Brennan submitted). Aspects of surface or- der m a y also affect the interpretation (Di Eugenio 1990; Hajicova and Vrbova 1982). An interpretation algorithm can also use pronominalization as an indicator of w h a t the speaker believes is salient (Grosz, Joshi, and Weinstein unpublished). Furthermore, ze- ros in Japanese are not realized syntactically so that there m u s t be a w a y to distinguish zeros from other entities inferred to be part of a discourse situation. Consider:. Example 6 Taroo ga 0 aimasita. Taroo SUBJ OBJ2 met Taroo met (0).. This sentence is not felicitous unless the addressee has already been given some information about the person that Taroo met, either in the current discourse or in previous discourses. In contrast, nonsubcategorized-for arguments such as adjuncts are not necessarily given a specific interpretation, but rather are given a nonspecific one.. Example 7 Taroo ga Hanako ni aimasita. Taroo SUBJ Hanako OBJ2 met Taroo met Hanako.. The sentence means that Taroo met Hanako at some time in some place: the tem- poral location of the meeting situation need not be specified. The speaker can utter this sentence even if the addressee does not know where and w h e n Taroo met Hanako. Thus, in this work, we only represent obligatorily subcategorized arguments of the verb on the Cf, assuming that the salience of discourse entities is partially determined by virtue of filling a verb's a r g u m e n t role, and the information from the subcatego- rization frame is used to determine that a zero is present in an utterance.. Zeros are then interpreted with reference to the current context. Prince has pro- posed that the current context should be categorized by ASSUMED FAMILIARITY (Prince 1981b; H o r n 1986), with a concomitant goal of determining the correlation between the use of certain linguistic forms and the types of assumed familiarity. The first di- vision of assumed familiarity is into the subtypes of NEW, INFERABLE, and EVOKED.. 205. Computational Linguistics Volume 20, Number 2. NEW c a n be d i v i d e d into BRAND-NEW, d i s c o u r s e entities that are b o t h n e w to the dis- c o u r s e a n d n e w to the hearer, a n d UNUSED, d i s c o u r s e entities o l d to the h e a r e r b u t n e w to the discourse. The i n f o r m a t i o n s t a t u s of EVOKED c a n b e f u r t h e r d i v i d e d into TEXTUALLY EVOKED, old in the d i s c o u r s e a n d therefore old to the h e a r e r as well, a n d SITUATIONALLY EVOKED, entities in the c u r r e n t situation. INFERABLES are technically b o t h h e a r e r - n e w a n d d i s c o u r s e - n e w b u t d e p e n d o n i n f o r m a t i o n t h a t is o l d to the h e a r e r a n d the discourse, a n d are often t r e a t e d b y s p e a k e r s as t h o u g h t h e y w e r e b o t h h e a r e r - o l d a n d d i s c o u r s e - o l d . There is a h i e r a r c h y of a s s u m e d familiarity in t e r m s of d i s c o u r s e salience:. Assumed Familiarity Hierarchy (Prince 1981b): TEXTUALLY EVOKED > SITUATIONALLY EVOKED > INFERABLE > UNUSED > BRAND-NEW. Z e r o s t y p i c a l l y refer to EVOKED entities, 13 b u t there is a scale of relative salience a m o n g the EVOKED entities. I n o u r t h e o r y this is m o d e l e d w i t h Cf r a n k i n g . We r e p e a t the p r o p o s e d r a n k i n g of the Cf h e r e a n d justify it in the f o l l o w i n g sections: 14. Cf Ranking for Japanese (GRAMMATICAL O R ZERO) TOPIC > EMPATHY > SUBJECT > OBJECT2 > OBJECT > OTHERS. The r e l e v a n c e of the n o t i o n s of TOPIC a n d s p e a k e r ' s EMPATHY to c e n t e r i n g is t h a t a d i s c o u r s e e n t i t y realized as the TOPIC o r the EMPATHY LOCUS is m o r e salient a n d s h o u l d be r a n k e d h i g h e r o n the Cf. W h e n e v e r a d i s c o u r s e e n t i t y s i m u l t a n e o u s l y fulfills m u l t i p l e roles, the e n t i t y is u s u a l l y r a n k e d a c c o r d i n g to the h i g h e s t r a n k e d role.. I n the f o l l o w i n g sections w e will d i s c u s s the m o t i v a t i o n for this r a n k i n g . Section 3.1 d i s c u s s e s the role of the g r a m m a t i c a l topic m a r k e r wa in Japanese. Section 3.2 e x p l a i n s the role of EMPATHY in J a p a n e s e d i s c o u r s e salience a n d s h o w s t h a t (GRAMMATICAL O R ZERO) TOPIC > EMPATHY a n d t h a t EMPATHY > SUBJ. Section 3.2.1 s h o w s h o w the c e n t e r i n g a l g o r i t h m h a n d l e s u t t e r a n c e s w i t h e m p a t h y loci. Z e r o t o p i c s will n o t b e d i s c u s s e d until Section 5.. 3.1 Topic D i s c o u r s e entities t h a t are EVOKED, INFERABLE, or UNUSED c a n be m a r k e d as t h e TOPIC. T h e s p e a k e r c a n n o t m a r k a n e n t i t y as the g r a m m a t i c a l TOPIC u n l e s s the h e a r e r is a w a r e of the object t h a t s / h e is g o i n g to talk a b o u t (Prince 1978a; K u n o 1976b). For e x a m p l e : . Example 8 H u t a r i w a paatii ni kimasita. t w o - p e r s o n TOP/SUBJ p a r t y to c a m e Speaking of two persons, they came to the party.. 13 Under certain circumstances that we cannot explore here, it appears that zeros can at times be used to refer to inferable or unused entities, just as pronouns in English sometimes can be.. 14 This ranking resembles Kuno's Empathy Hierarchy and Kameyama's Expected Center Order, but we distinguish two kinds of TOPIC and we posit that OBJECT2 is more salient than OBJECT. We continue Kuno's use of the term EMPATHY to represent the EMPATHY LOCUS, whereas Kameyama used the property IDENT for EMPATHY (Kameyama 1988).. 206. Marilyn Walker et al. Japanese Discourse. E x a m p l e 8 is felicitous o n l y w h e n hutari ( ' t w o p e r s o n s ' ) is u n d e r s t o o d as m e a n i n g the two people under discussion. T h e s e n t e n c e n e v e r m e a n s t h a t t h e p e o p l e w h o c a m e to the p a r t y n u m b e r e d t w o . . T h e fact t h a t t h e w a - m a r k e d e n t i t y s h o u l d b e d i s c o u r s e - o l d is also s h o w n b y t h e fact t h a t a w h - q u e s t i o n c a n n o t b e a n s w e r e d w i t h a w a - m a r k e d NP.. Example 9 a. D o n o hito. b-1.. b-2.. g a . w h i c h p e r s o n SUBJ Z i r o o OBJ d e f e n d e d Which person defended Ziroo ?. Taroo ga Z i r o o o b e n g o s i m a s i t a . Taroo SUBJ Z i r o o OBJ d e f e n d e d Taroo defended Ziroo.. *Taroo w a Z i r o o o b e n g o s i m a s i t a . Taroo TOP/SUBJ Z i r o o OBJ d e f e n d e d Taroo defended Ziroo.. Z i r o o o b e n g o s i m a s i t a ka. Q. W h a t the q u e s t i o n c o n t e x t s h o w s is t h a t e v e n in a s i m p l e d e c l a r a t i v e s e n t e n c e , t h e u s e of the topic m a r k e r wa c o n t r a s t s w i t h t h e subject m a r k e r ga in w h a t is u n d e r s t o o d as a l r e a d y in the d i s c o u r s e context. For instance, in a d i s c o u r s e initial u t t e r a n c e , 10a, a s s u m e s n o s h a r e d i n f o r m a t i o n or t h a t someone defended Ziroo a n d asserts t h a t t h e s o m e o n e is Taroo. In 10b, the d i s c o u r s e - o l d p r o p o s i t i o n is t h a t Taroo did something a n d w h a t is a s s e r t e d is t h a t w h a t h e d i d w a s d e f e n d Ziroo.. Example 10 a. Taroo ga. b.. Z i r o o o b e n g o s i m a s i t a . Taroo SUBJ Z i r o o OBJ d e f e n d e d Taroo defended Ziroo.. Taroo w a Z i r o o o b e n g o s i m a s i t a . Taroo TOP/SUBJ Z i r o o OBJ d e f e n d e d Taroo defended Ziroo.. While topics are o f t e n subjects, subject a n d g r a m m a t i c a l t o p i c n e e d n o t coincide. A n y a r g u m e n t c a n b e r e a l i z e d as a topic, as s h o w n i n e x a m p l e s 11 a n d 12.. Example 11 Taroo w a H a n a k o ga b e n g o s i t a . Taroo TOP H a n a k o SUBJ d e f e n d e d As for Taroo, Hanako defended (him).. Example 12 T o k y o o e w a H a n a k o ga itta° T o k y o to TOP H a n a k o SUBJ w e n t To Tokyo, Hanako went.. T h e a s s u m p t i o n t h a t the TOPIC is m o r e salient t h a n t h e SUBJECT, w h e n t h e t w o are different, is s u p p o r t e d b y the fact t h a t a n i n d e f i n i t e NP in subject p o s i t i o n s u c h as who, which, or somebody c a n n o t b e r e g a r d e d as the TOPIC: a n i n d e f i n i t e NP is n e v e r m a r k e d b y the t o p i c m a r k e r wa, b u t b y the subject m a r k e r ga. F o r e x a m p l e : . 207. Computational Linguistics Volume 20, N u m b e r 2. Example 13 D o n o h i t o g a Z i r o o o b e n g o s i m a s i t a ka. w h i c h p e r s o n SUBJ Z i r o o OBJ d e f e n d e d Q W h i c h person defended Ziroo ?. Example 14 * D o n o h i t o w a Z i r o o o b e n g o s i m a s i t a ka. w h o p e r s o n TOP/SUBJ Z i r o o OBJ d e f e n d e d Q W h i c h person defended Ziroo?. It is c l e a r f r o m t h e s e e x a m p l e s t h a t t h e g r a m m a t i c a l t o p i c , w a - m a r k e d entity, in J a p a n e s e , r e p r e s e n t s a s s u m a b l e s h a r e d i n f o r m a t i o n in a n o n g o i n g c o n v e r s a t i o n . I t h a s b e e n t a k e n t o b e t h e ' t h e m e ' o r ' w h a t t h e s e n t e n c e is a b o u t ' ( K u n o 1973; S h i b a t a n i 1990). I n o u r f r a m e w o r k , this is t h e role o f t h e Cb. W e will p r o v i d e e v i d e n c e s u p p o r t i n g this p o s i t i o n in S e c t i o n 4. H o w e v e r , w e c l a i m t h a t this is j u s t a d e f a u l t a n d t h a t o t h e r f a c t o r s c a n c o n t r i b u t e to e s t a b l i s h i n g o r c o n t i n u i n g a n e n t i t y as t h e Cb. K u n o a l s o c l a i m s t h a t a z e r o s u b j e c t is e q u i v a l e n t to a w a - m a r k e d entity, a n d w e p r o v i d e s u p p o r t f o r t h i s c l a i m in S e c t i o n 5, s h o w i n g t h a t t h e p r o p e r t y o f h a v i n g p r e v i o u s l y b e e n t h e C b , in c o m b i n a t i o n w i t h b e i n g r e a l i z e d b y a z e r o , c o n t r i b u t e s to a n e n t i t y b e i n g t h e C p . . 3.2 Empathy K u n o (1976b) p r o p o s e d a n o t i o n o f EMPATHY i n o r d e r to p r e s e n t t h e s p e a k e r ' s p o s i t i o n o r i d e n t i f i c a t i o n in d e s c r i b i n g a s i t u a t i o n . I n a h u g g i n g s i t u a t i o n i n v o l v i n g a m a n n a m e d Taroo a n d h i s s o n Saburoo, K u n o n o t e s t h a t this s i t u a t i o n c a n b e d e s c r i b e d in v a r i o u s w a y s , s o m e o f w h i c h a r e s h o w n in e x a m p l e 15.. Example 15 a. T a r o o h u g g e d S a b u r o o . . b. T a r o o h u g g e d h i s son.. c. S a b u r o o ' s f a t h e r h u g g e d h i m . . T h e s e s e n t e n c e s d i f f e r f r o m e a c h o t h e r w i t h r e s p e c t to camera angle, t h e p o s i t i o n t h a t t h e s p e a k e r t a k e s to o b s e r v e a n d d e s c r i b e this s i t u a t i o n . I n 15a, t h e s p e a k e r is a s s u m e d to b e d e s c r i b i n g t h e e v e n t objectively: t h e c a m e r a is p l a c e d at t h e s a m e d i s t a n c e f r o m b o t h Taroo a n d Saburoo. O n t h e o t h e r h a n d , t h e c a m e r a m a y b e p l a c e d c l o s e r to Taroo in 15b a n d c l o s e r to Saburoo i n 15c. T h i s is s h o w n b y t h e u s e of r e l a t i o n a l t e r m s s u c h a s son a n d father, r e s p e c t i v e l y . T h e t e r m EMPATHY is u s e d f o r t h i s camera angle, w h i c h i n d i c a t e s t h e s p e a k e r ' s p o s i t i o n a m o n g t h e p a r t i c i p a n t s in t h e e v e n t d e s c r i b e d Y . 15 The speaker's position is not determined by his physical proximity, but rather is measured by the emotional or social relationship. In this sense, the term speaker's identification (Kuno 1976b) may be more suitable than the term speaker's position. Furthermore, the notion of EMPATHY is different from that of perspective (Iida 1993). Empathy is the speaker's identification with a discourse entity, but the speaker does not have to take the perspective of the person who he empathizes with. For example, consider the following utterance:. (i) Taroo wa Hanako ni migigawa no hon o totte-kureta. T a r o o TOP/SUBJ Hanako OBJ2 right GEN b o o k OBJ take-gave Taroo did Hanako a flavor in taking a book on his~her right.. In this example, the speaker empathizes with Hanako as indicated by the empathy verb kureru, yet he still can describe the given situation from Taroo's perspective, which is indicated by ambiguity in the interpretation of the deictic expression migigawa no ('right of').. 208. Marilyn Walker et al. Japanese Discourse. In Japanese the realization of speaker's e m p a t h y is especially important w h e n describing an event involving giving or receiving. There is no w a y to describe a giving a n d receiving situation objectively (Kuno and Kaburaki 1977). In 16, the use of the verb kureru indicates the speaker's e m p a t h y with Ziroo, the discourse entity realized in object position, while in 17, the speaker's e m p a t h y with the subject Taroo is indicated by the use of the past tense form yatta of the verb yaru.. Example 16 Taroo ga Ziroo ni hon o kureta. Taroo SUBJ Ziroo OBJ2 book OBJ gave Taroo gave Ziroo a b o o k . EMPATHY=OBJ2=ZIROO. Example 17 Taroo ga Ziroo ni h o n o yatta. Taroo SUBJ Ziroo OBJ2 book OBJ gave Taroo gave Ziroo a b o o k . EMPATHY=SUB=TAROO. A verb that is sensitive to the speaker's e m p a t h y is an EMPATHY-LOADED verb. The EMPATHY LOCUS is the argument position whose referent the speaker automatically identifies with. In other words, the verb kureru has the EMPATHY LOCUS on the object, while verbs like yaru place the EMPATHY LOCUS on the subject.. The use of deictic verbs such as kuru ('come'), iku ('go'), okuru ('send to'), and yokosu ('send in') also encode the speaker's empathy. For example, the speaker indicates e m p a t h y with Taroo by using the past tense form kita of the verb kuru in the following example.. Example 18 Hanako wa Taroo no tokoro ni kita. H a n a k o TOP/SUBJ Taroo of place to came Hanako came to Taroo's place.. M a n y Japanese verbs can be m a d e into empathy-loaded verbs because of a pro- ductive verb-compounding operation by which these empathy-loaded verbs are used as the auxiliary verb, attaching to the main v e r b } 6 For example, kureru can be used as a suffix, to mark OBJ or OBJ2 as the EMPATHY LOCUS. The attachment of yaru marks SUBJECT as the EMPATHY LOCUS. The complex predicate m a d e by this operation inherits the EMPATHY LOCUS o f the suffixed verb. For example:. Example 19 Hanako ga Taroo ni hon o yonde-kureta. Hanako SUBJ Taroo OBJ2 book OBJ read-gave Hanako did Taroo a favor in reading a book. EMPATHY = OBJ2 = TAROO. In this case Taroo is interpreted as the EMPATHY LOCUS because of the auxiliary kureta attached to the main verb. Similarly in example 20, the speaker indicates empa- thy with Hanako by using the past tense form yatta of the verb yaru as an auxiliary verb to the main verb tazuneru.. 16 Certain intransitive verbs cannot be made into empathy-loaded verbs since the empathy-loaded versions make no sense, e.g. moreru (leak).. 209. Computational Linguistics Volume 20, Number 2. Example 20 H a n a k o ga Taroo o t a z u n e t e - y a t t a . H a n a k o SUBI Taroo OBJ v i s i t - g a v e (lit.)Hanako received a favor in visiting Taroo. EMPATHY = SUBJ = HANAKO. As d e m o n s t r a t e d in t h e f o l l o w i n g e x a m p l e s , a d i s c o u r s e e n t i t y t h a t is r e a l i z e d as t h e EMPATHY LOCUS m u s t b e EVOKED.. Example 21 Taroo ga Z i r o o ni o k a n e o k a s i t e - k u r e t a . T a r o o SUBJ Z i r o o OBJ2 m o n e y OBJ l e n d - g a v e Taroo did Ziroo a favor in lending him some money.. Example 22 *Taroo ga d a r e k a ni o k a n e o k a s i t e - k u r e t a . Taroo SUBJ s o m e b o d y OBJ2 m o n e y OBJ l e n d - g a v e Taroo did somebody a favor in lending him some money.. Example 23 *Taroo ga m i s i r a n u h i t o ni o k a n e o k a s i t e - k u r e t a . Taroo SUBJ u n k n o w n p e r s o n OBJ2 m o n e y OBJ l e n d - g a v e Taroo did a stranger a favor in lending him some money.. T h e c o n t r a s t b e t w e e n 21, 22, a n d 23 d e m o n s t r a t e s t h a t t h e u s e o f a BRAND-NEW e n t i t y in t h e EMPATHY LOCUS p o s i t i o n of t h e v e r b give is n o t accep t ab l e. T h e r e f o r e a n e n t i t y in the EMPATHY LOCUS p o s i t i o n is r a n k e d i n a h i g h e r p o s i t i o n o n t h e C f t h a n t h e subject.. 3.2.1 Empathy and the Centering Algorithm. U s i n g t h e C e n t e r i n g A l g o r i t h m , w e m o d e l EMPATHY as a l a n g u a g e - s p e c i f i c d i s c o u r s e f a c t o r b y a d d i n g t h e EMPATHY-marked d i s c o u r s e e n t i t y to t h e Cf r a n k i n g . T h e n p r e f e r e n c e s fo r CONTINUE o v e r RETAIN w h e n EMPATHY is i n v o l v e d c a n b e d e m o n s t r a t e d , as in e x a m p l e 24 b e l o w : 17. Example 24 a. H a n a k o w a k u r u m a ga k o w a r e t e k o m a t t e i m a s i t a . . H a n a k o TOP/SUBJ car SUBJ b r o k e n at a l o s s - w a s Her car broken, Hanako was at a loss.. C b : HANAKO. C f : [HANAKO, CAR]. b. Taroo ga 0 s i n s e t u - n i te o k a s i t e - k u r e m a s i t a . Taroo SUBJ OBJ2/EMP k i n d l y h a n d OBJ l e n d - g a v e . Taroo kindly did (Hanako) a favor in helping her.. Cb: [HANAKO] C f : [HANAKO, TAROO]. EMPATHY SUBJ. 17 The verb form kuremasita in 24b is the polite form of kureta, the past tense form of the verb kureru.. 210. Marilyn Walker et al. Japanese Discourse. C. Tugi no hi 0 0 eiga ni sasoimasita. next of d a y SUBJ OBJ m o v i e to invited Next day (Hanako) invited (Taroo) to a movie.. Cb: HANAKO C fl: [HANAKO, TAROO] CONTINUE 16. SUBJ OBI C f 2 : [TAROO, HANAKO ] RETAIN 2. SUBJ OBJ. In 24c, the verb invited requires an animate subject and object, and these m u s t be realized b y different discourse entities because of the contraindexing constraint. H a n a k o is the most highly ranked entity from 24b that is realized in 24c, and there- fore m u s t be the Cb. The preferred interpretation is therefore she invited him to a movie (Z = 5.25, p < .001). This corresponds to Cfl, the m o r e highly ranked CONTINUE tran- sition, in which H a n a k o is the preferred center, Cp. This interpretation can b e f o u n d with minimal processing b y trying the Cp(24b), Hanako, as the Cb(24c), b y interpret- ing the subject zero as Hanako. This gives a CONTINUE transition. Then contraindexing constraints m e a n that H a n a k o cannot fill b o t h argument positions, so the object posi- tion is interpreted as Taroo. This interpretation is f o u n d with minimal processing b y interleaving the steps of the Centering algorithm p r o p o s e d in Brennan et al. (1987).. Note that nothing special n e e d s to be said a b o u t the fact that EMPATHY is the discourse factor that m a d e H a n a k o the Cp in 24b and thus predicted that H a n a k o w o u l d be the Cb at 24c (pace Brennan, Friedman, and Pollard 1987). The preference in the interpretation follows from the distinction b e t w e e n CONTINUE and RETAIN and the ranking of Cf. Thus, the centering f r a m e w o r k is easily a d a p t e d to handle this language-specific feature.. 3.3 Topic and Empathy In general the assignment of the EMPATHY relationship is pragmatic. It is d e t e r m i n e d b y the s p e a k e r ' s relation to the discourse participants in the discourse. In 24, for example, the EMPATHY relationship b e t w e e n the speaker and Hanako and b e t w e e n the speaker and Taroo is clear: the use of the e m p a t h y verb in the second sentence indicates that the speaker is closer to Hanako than to Taroo.. However, besides cases w h e r e the speaker clearly expresses w h o s / h e empathizes with, it is also possible for the context to p r o v i d e some information a b o u t the s p e a k e r ' s proximity relationship with discourse participants in the given discourse, so that the hearer can determine the EMPATHY relation that the speaker has in mind. In this paper, w e only consider cases w h e r e EMPATHY is syntactically m a r k e d b y the use of e m p a t h y - loaded verbs.. Kuno's notion of EMPATHY is more general. For instance, Kuno's EMPATHY HIERAR- CHY consists of different scales for EMPATHY that include notions such as TOPIC and SPEAKER (Kuno 1987). Kuno's Topic E m p a t h y Hierarchy suggests that the discourse entity realized as the TOPIC will often coincide w i t h the EMPATHY LOCUS:. Topic Empathy Hierarchy: Discourse-Topic > Discourse-Nontopic Given an event or state that involves A and B such that A is corefer- ential with the topic of the present discourse and B is not, it is easier for the speaker to empathize with A than w i t h B.. In s u p p o r t of K u n o ' s claim, w e have f o u n d that w h e n no e m p a t h y relation is clearly indicated and no topic has b e e n clearly established that it is difficult for a. 211. Computational Linguistics Volume 20, Number 2. h e a r e r to d e t e r m i n e the e m p a t h y r e l a t i o n t h a t t h e s p e a k e r i n t e n d s . P r e v i o u s Cb s a n d c u r r e n t C p s c a n b e h i g h o n t h e e m p a t h y scale, a n d y e t t h e d i s c o u r s e e n t i t y r e a l i z e d as t h e g r a m m a t i c a l T O P I C d o e s n o t n e c e s s a r i l y c o i n c i d e w i t h t h e d i s c o u r s e e n t i t y r e a l i z e d as t h e EMPATHY LOCUS. A s i m p l e s e n t e n c e to s h o w this p o i n t is g i v e n in e x a m p l e 25 b e l o w : . Example 25. Taroo w a Z i r o o ni h o n o y o n d e - k u r e m a s i t a . T a r o o T O P / S U B J Z i r o o OBJ2 b o o k OBJ r e a d - g a v e Taroo gave Ziroo a //avor off reading a book. EMPATHY = OBJ2 = ZIROO. In e x a m p l e 25, Taroo is t h e TOPIC w h i l e Z i r o o is t h e EMPATHY LOCUS. Similarly, a z e r o d o e s n o t h a v e to b e r e a l i z e d as t h e EMPATHY LOCUS. In 26b t h e z e r o in t h e su b j ect p o s i t i o n realizes t h e Cb a n d r e f e r s to Taroo.. Example 26 a. Taroo w a s y u k u d a i o z e n b u y a r i - o e m a s i t a . . Taroo TOP/SUB h o m e w o r k OBJ all d o - f i n i s h e d Taroo finished his homework.. b. 0 Z i r o o ni h o n o y o n d e - k u r e m a s i t a . SUBJ Z i r o o OBJ2 b o o k OBJ r e a d - g a v e (Taroo) gave Ziroo a //avor o//reading a book. EMPATHY = OBJ2 = ZIROO. T O P I C is h i g h e r t h a n E M P A T H Y in t h e Cf r a n k i n g . T h e h i g h e r d e g r e e o f salience o f TOPIC o v e r EMPATHY is s h o w n b y the d i f f e r e n t i n t e r p r e t a t i o n o f (b) s e n t e n c e s in e x a m p l e s 27 a n d 28. T h e o n l y d i f f e r e n c e in t h e s e e x a m p l e s is t h a t M i t i k o is w a - m a r k e d in 27a b u t is g a - m a r k e d in 28a:. Example 27 a. M i t i k o w a k a n a i o g i t y o o ni o s i t e - k u r e m a s i t a . . M i t i k o T O P / S U B J w i f e O B J / E M P c h a i r m a n OBJ2 r e c o m m e n d - g a v e Mitiko did my wife a favor in recommending her as chairperson.. b. 0 a s u n o k a i h y o o - k e k k a o t a n o s i m i - n i s i t e i m asu. SUBJ t o m o r r o w of results OBJ l o o k - f o r w a r d d o i n g - i s (Mitiko) is looking forward to tomorrow's results.. Example 28 a. M i t i k o ga k a n a i o g i t y o o ni o s i t e - k u r e m a s i t a . . M i t i k o SUBJ w i f e OBJ/EMP c h a i r m a n OBJ2 r e c o m m e n d - g a v e Mitiko did m y wife a //avor in recommending her as chairperson.. b. 0 a s u n o k a i h y o o - k e k k a o t a n o s i m i - n i siteimasu. SUBJ t o m o r r o w of r e s u l t s OBJ l o o k - f o r w a r d d o i n g - i s (Mitiko) is looking forward to tomorrow's results. ( M y wife) is looking forward to tomorrow's results.. 212. Marilyn Walker et al. Japanese Discourse. T h e T O P I C Mitiko is p r e f e r r e d as t h e u n e x p r e s s e d s u b j e c t o f t h e (b) s e n t e n c e in e x a m p l e 27. TM O n t h e o t h e r h a n d , t h e s u b j e c t Mitiko is n o t s t r o n g l y p r e f e r r e d , as s h o w n in e x a m p l e 28: t h e z e r o in t h e s e c o n d s e n t e n c e i n 28 is u n d e r s t o o d a s r e f e r r i n g to e i t h e r Mitiko o r m y wife. T h a t is, t h e p o s s i b l e i n t e r p r e t a t i o n in t h e s e e x a m p l e s s h o w s t h a t t h e NP m y wife, w h i c h is r e a l i z e d as t h e EMPATHY LOCUS, is n o t a s s a l i e n t as t h e TOPIC. 19. So w h y is it e a s i e r to e m p a t h i z e w i t h a d i s c o u r s e e n t i t y t h a t h a s b e e n t h e t o p i c as K u n o d e m o n s t r a t e s ? I t s e e m s i m p o r t a n t to k e e p t h e n o t i o n s o f TOPIC a n d EMPATHY s e p a r a t e , b u t in S e c t i o n 5.1 w e w i l l d e m o n s t r a t e a n effect w h e r e t h e t o p i c e n t i t y is i n t e r p r e t e d as t h e e m p a t h y locus. We c l a i m t h a t t h e r a n k i n g o f t h e C f a n d t h e p o t e n - tial f o r a CONTINUE i n t e r p r e t a t i o n d e t e r m i n e s w h e t h e r this effect w i l l h o l d . I n o t h e r w o r d s , t h e t e n d e n c y f o r t h e t o p i c e n t i t y to b e i n t e r p r e t e d a s t h e e m p a t h y l o c u s f o l l o w s f r o m m o r e g e n e r a l d i s c o u r s e p r o c e s s i n g factors, s u c h as a h e a r e r p r e f e r r i n g CONTINUE t r a n s i t i o n s w i t h i n a g i v e n local s t r e t c h o f d i s c o u r s e . . 3.4 Summary To s u m m a r i z e , w e h a v e o u t l i n e d t h e r o l e s o f d i s c o u r s e m a r k e r s s u c h as t h o s e f o r TOPIC a n d EMPATHY b y w h i c h J a p a n e s e g r a m m a t i c i z e s s o m e a s p e c t s o f d i s c o u r s e f u n c t i o n , a n d w e h a v e a r g u e d t h a t TOPIC a n d EMPATHY m a r k e r s c a n o n l y b e u s e d o n e n t i t i e s t h a t a r e a l r e a d y in t h e d i s c o u r s e c o n t e x t . . O n e f a c t o r t h a t h a s n ' t b e e n d i s c u s s e d is t h e r o l e o f p r o n o m i n a l i z a t i o n , b u t m a n y r e s e a r c h e r s h a v e a r g u e d t h a t d i s c o u r s e e n t i t i e s r e a l i z e d b y p r o n o u n s a r e m o r e s a l i e n t t h a n o t h e r d i s c o u r s e e n t i t i e s ( C l a r k a n d H a v i l a n d 1977; G r o s z , Joshi, a n d W e i n s t e i n u n p u b l i s h e d ; K u n o 1976b, 1987). We t a k e z e r o s in J a p a n e s e to b e a n a l o g o u s to p r o - n o u n s in E n g l i s h in this r e s p e c t . Since p r o n o m i n a l i z a t i o n c a n a p p l y a t a n y p o s i t i o n in t h e r a n k i n g o f t h e Cf, t h e r o l e o f its c o n t r i b u t i o n is p a r t i c u l a r l y i n t e r e s t i n g w h e n it is in conflict w i t h s o m e o t h e r f a c t o r s u c h as g r a m m a t i c a l f u n c t i o n o r t o p i c m a r k i n g . T h i s will b e d i s c u s s e d f u r t h e r in S e c t i o n 5.. 4. Initial Center Instantiation. INITIAL CENTER INSTANTIATION is a p r o c e s s b y w h i c h a d i s c o u r s e e n t i t y i n t r o d u c e d in a s e g m e n t - i n i t i a l u t t e r a n c e b e c o m e s t h e Cb. I n o u r f r a m e w o r k , this h a p p e n s as a s i d e effect o f t h e C e n t e r i n g A l g o r i t h m . Typically, w h e n a n i n t e r p r e t a t i o n is f o u n d f o r t h e s e c o n d u t t e r a n c e in a d i s c o u r s e s e g m e n t , t h e C b b e c o m e s i n s t a n t i a t e d . 2° T h e C b o f a n initial u t t e r a n c e Ui is t r e a t e d as a v a r i a b l e t h a t is t h e n u n i f i e d w i t h w h a t e v e r C b is a s s i g n e d to t h e s u b s e q u e n t u t t e r a n c e Ui+l.. Typically, a d i s c o u r s e e n t i t y is i n t r o d u c e d as a g a - m a r k e d subject, a n d t h e n is r e f e r r e d to b y a z e r o in a s u b s e q u e n t u t t e r a n c e ( C l a n c y a n d D o w n i n g 1987). C o n s i d e r e x a m p l e 29.. 18 The zero may be interpreted as indirectly referring to the speaker. This interpretation is always possible when the verb kureru is used: the use of kureru implies that the speaker is closer to the beneficiary argument (i.e. the 0-marked NP in these examples), and the favor given to this person is understood as a benefit to the speaker as well.. 19 Although it seems as though empathy isn't higher than subject, the conflating factor is that topic marking establishes a Cb, whereas in 28 no Cb has been established. This is explained in detail in Section 4.. 20 In Walker, Iida, and Cote (1990) we called this Center Establishment. Henceforth we will refer to this process as Center Instantiation in order to avoid confusion with Kameyama's term center establishment, which is a different mechanism in her theory (Kameyama 1985).. 213. Computational Linguistics Volume 20, Number 2. Example 29 a. Taroo ga. b.. d e e t a o k o n p y u u t a a ni u t i k o n d e i m a s i t a . Taroo SUBJ d a t a OBJ c o m p u t e r in w a s - s t o r i n g Taroo was storing the data in a computer.. Cb: [?] I Cf: [TAROO, DATA]. 0 y a t t o h a n b u n y a r i - o w a r i m a s i t a . SUBJ finally h a l f d o - f i n i s h e d Finally (Taroo) was half finished. [ Cb: TAROO ]. Cf: [TAROO] CONTINUE. U s i n g Taroo as the subject in e x a m p l e 29a is n o t e n o u g h to establish this d i s c o u r s e s e g m e n t as b e i n g a b o u t Taroo. It is the use of the zero in e x a m p l e 29b t h a t serves to i n s t a n t i a t e Taroo as the Cb. By o u r d e f i n i t i o n of CONTINUE, 29b is a c o n t i n u e transition, b e c a u s e Cb(29b) = Cp(29b) a n d there w a s n o Cb in 29a. H o w e v e r , K u n o a r g u e s t h a t referring to a d i s c o u r s e e n t i t y w i t h a zero is e q u i v a l e n t to m a r k i n g it as the g r a m m a t i c a l topic w i t h wa ( K u n o 1972). O u r i n t e r p r e t a t i o n of this a r g u m e n t is t h a t the u s e of wa in a discourse-initial u t t e r a n c e i n s t a n t i a t e s the w a - m a r k e d e n t i t y as the Cb in o n e u t t e r a n c e . This claim is s u p p o r t e d b y the c o n t r a s t w i t h the GA-WA a l t e r n a t i o n in e x a m p l e s 30 a n d 31, w h e r e there is a shift in i n t e r p r e t a t i o n d e p e n d i n g o n w h e t h e r Taroo is m a r k e d w i t h wa in the first sentence. 21. Example 30 a. Taroo ga. b.. Ziroo o m i n ' n a n o m a e d e t a t a k i m a s i t a . SUBJ OBJ all of f r o n t in hit.. Taroo hit Ziroo in front of all the other people. Cb: [71 [ Cf: [TAROO, ZIROO]. I t i n i t i - z y u u , k a n z e n - n i 0 0 m u s i - s i m a s i t a . a l l - d a y c o m p l e t e l y i g n o r e d (Ziroo) ignored (Taroo) all day.. Cb: TAROO Cf: [TAROO, ZIROO] 3. Cb: ZIROO Cf: [ZIROO, TAROO] 8. In e x a m p l e 30, Taroo is i n t r o d u c e d b y ga. In this case, it a p p e a r s t h a t there is a ten- d e n c y d u e to lexical s e m a n t i c s to i n s t a n t i a t e Ziroo as the Cb in the s e c o n d utterance. 22. By the c e n t e r i n g definitions, t a k i n g either Taroo or Z i r o o to be the Cb c a n result in a CONTINUE interpretation. H o w e v e r , a s s u m i n g t h a t the C f o r d e r i n g at e x a m p l e 30a is correct, c o n s t r a i n t 3 is v i o l a t e d b y the p r e f e r r e d i n t e r p r e t a t i o n of 30b. Since b o t h of the entities in Cf(30a) are realized, the Cb in e x a m p l e 30b s h o u l d be the m o s t h i g h l y r a n k e d one. There are t w o possible c o n c l u s i o n s here: (1) In discourse-initial u t t e r a n c e s , w h e n . 21 These examples were tested by asking survey participants to indicate preference rankings. The numbers given here are only for those subjects who expressed strong preferences; some subjects expressed no preference.. 22 The number of subjects here are too small to test statistically.. 214. Marilyn Walker et al. Japanese Discourse. n o c l e a r i n d i c a t i o n o f t o p i c is g i v e n , t h e C f o r d e r i n g a l o n e is n o t a s t r o n g c o n s t r a i n t ; (2) t h e o r d e r i n g o f t h e C f s h o u l d b e p a r t l y d e t e r m i n e d b y lexical s e m a n t i c s o r o t h e r k n o w l e d g e a b o u t t h e s i t u a t i o n b e i n g d e s c r i b e d . H o w e v e r , c o m p a r e e x a m p l e 30 w i t h e x a m p l e 31.. E x a m p l e 31 a. T a r o o w a Z i r o o o m i n ' n a n o m a e d e t a t a k i m a s i t a . . SUBJ OBJ all o f f r o n t in hit. Taroo hit Ziroo in front of all the other people.. C b : [TARO0] Cf: [TAROO, ZIRO0]. b. Itiniti-zyuu, kanzen-ni 0 0 musi-simasita. all-day completely ignored (Taroo) ignored (Ziroo) all day.. Cb: TAROO Cf: [TAROO, ZIROO] 10. Cb: ZIROO Cf: [ZIROO, TAROO] 4. T h e u s e o f wa i n e x a m p l e 31 s e e m s to o v e r r i d e t h e s e m a n t i c p r e f e r e n c e t h a t w a s e x h i b i t e d in e x a m p l e 30, s o t h a t s u b j e c t s n o w p r e f e r a n i n t e r p r e t a t i o n in w h i c h T a r o o is t h e Cb. 23 T h i s s h o w s t h a t T a r o o h a s n o t b e e n i n s t a n t i a t e d as t h e C b w h e n it is t i m e to i n t e r p r e t t h e t w o z e r o s in e x a m p l e 30b. We e x p l a i n t h e c o n t r a s t b y a s s u m i n g t h a t t h e TOPIC i n s t a n t i a t e s t h e C b w h e n it is first i n t r o d u c e d in a d i s c o u r s e - i n i t i a l u t t e r a n c e , a s i n e x a m p l e 31a. T h e n t h e o n l y w a y to g e t a CONTINUE i n t e r p r e t a t i o n f o r 31b is f o r T a r o o to b e t h e C b a t 31b.. F u r t h e r m o r e , w e c a n d e t e c t n o d i f f e r e n c e s i n t h e i n t e r p r e t a t i o n o f t h e final u t t e r - a n c e b e t w e e n t h r e e u t t e r a n c e s e q u e n c e s in w h i c h a n e n t i t y is i n t r o d u c e d b y wa, a n d f o u r u t t e r a n c e s e q u e n c e s in w h i c h a n e n t i t y is first i n t r o d u c e d b y ga a n d t h e n r e a l i z e d b y a z e r o in t h e s e c o n d u t t e r a n c e . T h i s p r o v i d e s f u r t h e r s u p p o r t for t h e c l a i m t h a t t h e s t a t u s
Show more

New documents

In this paper, we demonstrated that, by simply adding skip connections between all layer pairs of a neural network, we are able to achieve similar perplexity scores as a large stacked

Then the intravascular heat exchange cooling technique combined with plasmapheresis was applied, and the patient’s body temperature and elimination of the thyroid hormone was

TABLE 5: Indexes of Cost of Materials, Salaries and Wages, Total Costs and Remainder of Net Output Per Unit of Output, in Exposed, Sheltered and All Transportable Goods Industries,..

In the fu- ture, we plan to extensively compare the learning of word matrix representations with vector space models in the task of sentiment analysis on several datasets... References

We explore transfer learning based on model adaptation as an approach for training ASR models under constrained GPU memory, throughput and training data.. We conduct several systematic

In blood lipid profile, serum TG level had no signifi- cant difference P > 0.05 in two groups; serum cholesterol levels TC, HDL-C and LDL-C were significantly lower than the control

If this implication seems to clash with experience of small open economies, it is worth noting, however, that in a dynamic, Harrod-Domar, version of such a model, the balance of trade

Figure 2: Pseudo-code for a simple Θn2 time and Θn space algorithm for computing the cardinality of a circle graph’s maximum independent sets... Input: The σ-representation of an n

Other multilingual sentence representation learning techniques include BAE Chandar et al., 2013 which trains bilingual autoencoders with the objective of minimizing reconstruction error

1Center for Hepatobiliary and Pancreatic Diseases, Beijing Tsinghua Changgung Hospital, Medical Center, Tsinghua University, China; 2Department of Gastroenterology, The Second