A Tradeoff between Compositionality and Complexity in the Semantics of Dimensional Adjectives

10 

A T r a d e o f f b e t w e e n C o m p o s i t i o n a l i t y a n d C o m p l e x i t y i n t h e S e m a n t i c s o f D i m e n s i o n a l A d j e c t i v e s . Geoffrey Simmons Graduiertenkolleg Kognitionswissenschaft. Universit£t Hamburg Bodenstedtstr. 16. D-W-2000 Hamburg 50 Germany. e-maih simmons@bosun2.informatik.uni-hamburg.de. Abstract Linguistic access to uncertain quantita- tive knowledge about physical properties is provided by d i m e n s i o n a l a d j e c t i v e s , e.g. long-short in the spatial and tempo- ral senses, near-far, fast-slow, etc. Seman- tic analyses of the dimensional adjectives differ on whether the meaning of the dif- ferential comparative (6 cm shorter than) and the equative with factor term (three times as long as) is a compositional func- tion of the meanings the difference and fac- tor terms (6 cm and three times) and the meanings of the simple comparative and equative, respectively. The compositional treatment comes at the price of a meaning representation that some authors ([Pinkal, 1990], [Klein, 1991]) find objectionally un- parsimonious. In this paper, I compare semantic approaches by investigating the complexity of reasoning that they entail; specifically, I show the complexity of con- straint propagation over real-valued inter- vals using the Waltz algorithm in a system where the meaning representations of sen- tences appear as constraints (cf. [Davis, 1987]). It turns out that the compositional account is more complex on this measure. However, I argue t h a t we face a tradeoff rather than a knock-down argument against compositionality, since the increased com- plexity of the compositional approach may be manageable if certain assumptions about the application domain can be made.. TOPIC AREAS: semantics, AI-methods in com- putational linguistics. 1 Introduction In the past decade, the field of knowledge represen- tation (KR) has seen impressive growth of sophis- tication in the representation of uncertain quantita- tive knowledge about physical properties in common- sense reasoning and qualitative physics. The input to most of these systems is entered by hand, but some of them, especially those with commonsense domains involving spatial and temporal knowledge, are amenable to interaction by means of a natural language interface. Linguistic access to knowledge about properties such as durations, rates of change, distances, the sizes of the symmetry axes of objects, and so on, is provided by d i m e n s i o n a l a d j e c t i v e s (e.g. long-short in the spatial and temporal senses, fast-slow, near-far, tall-short). In this paper, I will investigate two aspects of their semantics t h a t have an impact on the quality of a KR system with an NL interface. One aspect is the c o m p l e x i t y o f r e a s o n - i n g entailed by their semantic interpretations. As an example, suppose that we have a text about the in- stallation of new kitchen appliances t h a t contains the following sentences:. (1) a. The refridgerator is about 60 cm wide. b. The cupboard is about as deep as the. refridgerator is wide. c. The kitchen table is about 5 cm longer. than the cupboard is deep. d. The oven is about twice as high as. the table is long.. We may view the relations expressed by these sen- tences as constraints on the measurements of the ob- ject axes (the width of the fridge, the depth of the cupboard, and so on), which are represented as pa- rameters i n a constraint system. Then constraint propagation, along with some knowledge about the. 348. sizes that are typical for object categories, should allow us to derive the following sentences (among others) from (1):. (2) a. The cupboard is a b o u t 60 cm deep. b. The kitchen table is longer than. the refridgerator i s wide. c. The kitchen table is short. (for a kitchen table). d. The oven is a b o u t 70 cm higher than. the cupboard is deep. e. The oven is high (for an oven).. The inferences from (1) t o (2) are rather simple, but reasoning can become very complicated if a large number of parameters and constraints must be ac- counted for. As we will see below, the computational properties of this kind of reasoning are dependent on the types of relations that appear in the knowledge base. Thus in the present paper, I investigate the kinds of relations that appear in formal theories of the meanings of the following morphosyntactic con- structions of dimensional adjectives:. (3) a. P o s i t i v e The board is long/short.. b. C o m p a r a t i v e The board is (6 cm) longer/shorter than the table is wide.. c. E q u a t i v e The board is (three times) as long as the table is wide.. d. M e a s u r e m e n t The board is 50 cm long.. This brings us to the second issue: the c o m p o - s i t i o n a l i t y of meaning representations proposed for the sentences in (3). It is appealing from the view- point of theoretical linguistics to regard each of the morphosyntactic categories (positive, etc.) as lexical items with their own semantics, and to assume that the semantics of each sentence in (3) is a composi- tional function of the semantics of the morphosyntac- tic category and the semantics Of the adjective stem. Compositional meaning representations may also be computationally more advantageous, since they can be computed very efficiently from syntactic represen- tations (e.g. in unification-based formalisms).. Most formal theories of the meanings of adjectives a t t e m p t to fulfill this criterion of compositionality, but as we will see, they differ on a more far-reaching criterion: whether the meaning of the differential comparative (6 cm shorter than) and the equative with factor term (three times as long as) is a compo- sitional function of the meanings the difference and factor terms (6 cm and three times) and the meanings of the simple comparative and equative, respectively. Although compositionality is generally regarded as a virtue in and of itself, some authors ([Pinkal, 1990], [Klein, 1991]) have objected to compositional treat- ments of difference and factor terms on the grounds. that they introduce an excessive amount of mathe- matical structure into our linguistic models.. In section 3, I will compare semantic representa- tions that do and do not foresee a compositional treatment of difference and factor terms by analyzing the complexity of reasoning that they entail. In par- ticular, I will investigate the complexity of constraint propagation in a system where the meaning repre- sentations appear as constraints. In this paradigm, uncertain quantitative knowledge is accounted for with real-valued intervals, a popular choice in K R systems, and constraint propagation is performed by the W a l t z a l g o r i t h m (which gets its name from David Waltz [1975]). Ernest Davis [1987] shows in his detailed analysis that the Waltz algorithm is one of the best choices for this task, for reasons that I will explain in section 3.1. It turns out that the constraint propagation with the Waltz algorithm under the compositional ap- proach is more complex; thus, we apparently face a t r a d e o f f b e t w e e n c o m p o s i t i o n a l i t y a n d c o m - p l e x i t y . I argue in section 4 that this is indeed a tradeoff, since the non-compositional formation of meaning representations may be expensive, and the increased complexity of the compositional approach may be manageable, especially if certain assumptions can be made about the domain of physical properties being represented.. 2 Compositionality in the Semantics of Adjectives. There is a vast amount of linguistic d a t a on which a formal semantics of adjectives can be evaluated, such as the interaction of comparative and equative complements with scope-bearing operators: quanti- tiers, logical connectives, modal operators and neg- ative polarity items (e.g. John is taller than I will ever be). A good theory must also account for the phenomenon of markedness, i.e. the semantic asym- metry of the antonyms (see [Lyons, 1977, Sect. 9.1]). However, I will ignore these issues in order to focus on the matter of compositionality. Thus I classify the existing theories of adjective meaning very coarsely as 'compositional' or 'non-compositional'. Note that these labels indicate only whether or not the treat- ment of difference and factor terms is compositional (in other respects, all of the theories mentioned be- low are compositional).. To begin with, I presuppose a component of di- mensional designation that determines which prop- erty of an object is described by an adjective, thus. 1I have only recently become acquainted with Eero Hyv5nen's "tolerance propagation" (TP) approach to constraint propagation over intervals (see [Hyv6nen, 1992]), which in some circumstances can compute solu- tions that are superior to those of the Waltz algorithm, but at the price of increased complexity. I comment on this briefly in section 3.2.. 349. S e m a n t i c A n a l y s e s o f D i m e n s i o n a l A d j e c t i v e s Formal i n t e r p r e t a t i o n s o f (3). a. P o s i t i v e amount(length(board)){'q / r--}Nc(length(board)). b. C o m p a r a t i v e amount(length(board)){-q / F)amount(width(table)). c. E q u a t i v e amount(length(board)) ~ amount(width(table)). d. M e a s u r e m e n t amount(length(board))= (50, cm). T a b l e 1: N o n - c o m p o s i t i o n a l a p p r o a c h . a. P o s i t i v e amount(length(board)){'q / r } D + We(length(board)). b. C o m p a r a t i v e amount(length(board)){~ / f - } D rl: amount(width(table)). c. E q u a t i v e amount(length(board)) ~_ n x amount(width(table)). d. M e a s m - e m e n t amount(length(board)))=(50, em). T a b l e 2: C o m p o s i t i o n a l a p p r o a c h . d e t e r m i n i n g t h a t short conference describes a dura- tion b u t short stick describes the length of the stick's elongated axis. Each class o f p r o p e r t i e s ( d u r a t i o n , length, etc.) is assumed to be associated with a set of d e g r e e s reflecting their m a g n i t u d e s . I will sim- ply use the f u n c t i o n expression amount(p(x)) t o de- n o t e the degree to which e n t i t y x exhibits p r o p e r t y p. Each set of degrees is assumed to be ordered, and I will use the symbols I- and E for the ordering relation. Most a u t h o r s assume m e a s u r e m e n t t h e o r y ([Krantz et al., 1971]) as the a x i o m a t i c basis in t h e f o r m a l semantics of linguistic m e a s u r e m e n t expres- sions (cf. [Klein, 1991]). For m e a s u r e m e n t expres- sions such as 3 cm, I simply use a tuple (3, cm) de- n o t i n g a degree. Finally, I follow [Bierwisch, 1989] in using t h e s y m b o l We(a) for the ' n o r m ' e x p e c t e d for a m o u n t a in c o n t e x t C. T h i s reflects the usual a s s u m p t i o n t h a t the positive expresses a relation t o a c o n t e x t - d e p e n d e n t s t a n d a r d . In this paper, I will restrict m y a t t e n t i o n to n o r m s t h a t are typical for t h e categories n a m e d in the sentence, such as tall for an adult Dutchman, slow for a sports car, etc. 2. T h e class of theories t h a t I a m referring to as 'non- c o m p o s i t i o n a l ' include those of [Cresswell, 1976], [Hoeksema, 1983] and [Pinkal, 1990], who propose formulas similar to those in T a b l e 1 as int erp ret a- tions o f t h e sentences in (3). T h e relation used in place of t h e expression {-] / [-'} is -1 for t h e un- m a r k e d case (e.g. tall) and 1- for the m a r k e d case (short) .3. 2Clearly, there are many other kinds of norms. Jan is tall may mean tall for his age, taller than I expected, etc. [Sapir, 1944] is still one of the best surveys of the norms employed in natural language, while Bierwisch has a more modern analysis.. 3Of course, Tables 1 and 2 are strong simplifica-. I call this a p p r o a c h n o n - c o m p o s i t i o n a l because in- t e r p r e t a t i o n s o f t h e differential c o m p a r a t i v e (6 cm longer than) an d o f t h e e q u a t i v e with f a c t o r t e r m (three times as long as) are n o t derivable f r o m the f o r m u l a s shown in lines (b) a n d (c) (t h e sam e can be said o f [K am p , 1975] an d [Klein, 1980]).. T h e c o m p o s i t i o n a l a p p r o a c h is t a k e n b y [Hellan, 1981], [von Stechow, 1984] an d [nierwisch, 1989], whose renderings o f (3) are, in simplified f o r m , some- t h i n g like those in T a b l e 2. T h e s y m b o l ' + ' is + in t h e u n m a r k e d case a n d - in t h e m a r k e d case, and ' x ' s t a n d s for scalar m u l t i p l i c a t i o n . 4. In t h e case o f t h e positive a n d t h e o r d i n a r y com- p a r a t i v e , t h e difference t e r m D is ex i st en t i al l y quan- tified, as is t h e f a c t o r t e r m n in t h e case o f t h e ordi- n a r y eq u at i v e (with t h e a d d i t i o n a l c o n d i t i o n t h a t n is g reat er t h a n or equal t o one). B u t if t h e difference or f a c t o r t e r m is realized in t h e sentence surface, t h e n its c o n t r i b u t i o n t o (b) an d (c) in ]?able 2 is e m b e d d e d compositionally. 5. tions that fail to reflect important differences between the authors mentioned that are unrelated to the issue of compositionaiity.. 4In measurement theory, the ' + ' operation is inter- preted as concatenation in the empirical domain, and scalar multiplication is interpreted as repeated concate- nation. Krantz et at. [1971] show that under proper ax- iomatization, concatenation is homomorphic to addition on the reals.. SBierwisch [1989] differs from the other authors ad- vocating a compositional approach in that he does n o t assume the interpretation of the equative shown in Ta- ble 2. He points out (p. 85) that this analysis does n o t account for the fact t h at the equative is norm-related in the unmarked case: Fritz is as short as Hans presup- poses that Fritz and Hans are short. Moreover, it is n o t clear whether this approach can capture the duality of. 350. For the computational analysis, we will need to classify the relations shown in Tables 1 and 2, since these relations form the input to a knowledge base. But to do so, we must first decide what sorts of en- tities the difference and factor terms denote. I as- sume t h a t they do not denote constants, since we may be just as uncertain of their magnitudes as we are of the other magnitudes mentioned in the sen- tences. Thus it should be possible to treat each of the mini-discourses in (4)-(6) in a similar fashion:. (4) a. The board is 90 to 100 cm long. b. In fact, it is about 95 cm long.. (5) a. The board is longer than the table is wide. b. In fact, it is about 6 cm longer.. (6) a. The board is five to ten times as long as the table is wide.. b. In fact, it is about seven times as long.. The information given in (b) in (4)-(6) can be ac- counted for by simply modifying the terms intro- duced in (a). Hence, the difference and factor terms, like the ' a m o u n t ' terms in Tables 1 and 2, denote uncertain quantities whose magnitude may be con- strained by sets of sentences. I will refer to these terms generally as 'parameters'.. With this assumption, we can classify the relations in Tables 1 and 2 as follows:. (7) N o n - c o m p o s i t i o n a l a. Ordering relations. (Positive, Comparative, Equative) b. Linear relations. of the form a m o u n t ( x ) + D ~ a m o u n t ( y ) (Differential Comparative). c. Product relations of the form n x a m o u n t ( x ) ~_ a m o u n t ( y ) (Equative with factor term). (8) C o m p o s i t i o n a l a. Linear relations. (Positive, Comparative, Differential Comparative) b. Product relations. (Equative with & without factor term). In both approaches, measurements simply serve to identify the degree to which an object exhibits the property in question.. Under the compositional approach, it is possible to assume a single semantic representation in the lex- icon for each adjective stem and each morphosyn- tactic category such that the formulas in Table 2 are generated from those lexical entries. Bierwisch [1989], for example, proposes lexical entries of the following form for each dimensional adjective:. ~ c ~ x [ a m o u n t ( p ( x ) ) = (v :t: c)]. comparatives and equatives: Fritz is taller than Hans should be semantically equivalent to Hans is not as tall as Fritz. However, Bierwisch does assume a representa- tion like this for equatives with realized factor terms.. where c is a difference value and v is a comparison value (see [nierwisch, 1989] for details).. But the elegance of the compositional approach comes at the price of lexicM semantic representations that include addition and multiplication operators~ which is precisely what Pinkal [1990] and Klein [1991] have criticized: they find the assumption of math- ematical operations as basic constituents of lexical meaning uncomfortably strong. This is one of the reasons why Pinkal proposes separate lexical entries for each morphosyntactic form of an adjective.. 3 T h e C o m p l e x i t y o f C o n s t r a i n t . P r o p a g a t i o n . The objection to the complexity of the lexical mean- ing representations required for the compositional approach appeals to intuitions of parsimony, and is in part a matter of philosophical opinion t h a t may be difficult to resolve. Perhaps a decision could be made on the basis of psycholinguistic experimenta- tion, but I will pose a more utilitarian question in this section by examining whether the increase in representational complexity in the transition from Table 1 to Table 2 entails an increase in the com- putational complexity of reasoning for a knowledge base containing those representations. The reasoning paradigm to be investigated is constraint propaga- tion (sometimes called constraint satisfaction) over real-valued intervals.. Intervals are intended to account for uncertainty in quantitative knowledge. For example, the mea- surement of a parameter at 20 units on some scale with a possible measurement error of +0.5 units is represented as [19.5, 20.5], to be interpreted as mean- ing that the unknown measurement value in ques- tion lies somewhere in the set {x119.5 <_ x <_ 20.5}. Additional knowledge about the relations that hold between parameters constrains their possible values to smaller sets (hence the term 'constraints' for the propositions in a knowledge base expressing such re- lations).. Constraint propagation over intervals has been ap- plied in spatial reasoning ([McDermott and Davis, 1984; Davis, 1986; Brooks, 1981; Simmons, 1992]), temporal reasoning (e.g. [Dean, 1987; Allen and Kautz, 1985]) and in systems of qualitative physics (see [Weld and deKleer, 1990; Bobrow, 1985]). In- tervals have a very obvious weakness in t h a t the highly precise choice of endpoints can rarely be well- motivated in natural domains such as these. In par- ticular, the reasoner may draw very different infer- ences, e.g. about whether two intervals overlap, if the endpoint of some interval is changed by what seems to be an insignificant amount. Thus, as Me.- Dermott and Davis[1984] note, such a system must not only be able to report whether they overlap, but also "how close" they come to overlapping.. If they do come close . . . , then . . . [ t h e . 351. reasoner] m u s t decide whether to act on the suspect i n f o r m a t i o n or work to gather more, which is really the only interesting decision in a case like this. Eventually, when all possible i n f o r m a t i o n has been gathered, if things are still close to the borderline then a decision maker m u s t j u s t use some a r b i t r a r y criterion to make a decision. We d o n ' t see how anyone can escape this. [McDermott and Davis, 1984, p. 114]. A formalism such as fuzzy logic a t t e m p t s to al- leviate the problem of sharp borderlines by using infinitely m a n y i n t e r m e d i a t e t r u t h values for vague predicates. I h a p p e n to have reservations a b o u t the adequacy of fuzzy logic for this task 6, b u t I have cho- sen to s t u d y constraint propagation m a i n l y because its c o m p u t a t i o n a l properties are well-researched and are attractive for applications in which the potential overprecision of endpoints can be tolerated. T h u s it provides a sound basis for comparing the semantic analyses presented in section 2.. 3.1 S y n t a x a n d S e m a n t i c s . In the following, I briefly review some definitions from [Davis, 1987, Appendix B] (with slight modi- fications). S y n t a x Assume a set of symbols X = { X I , . . . , X v} called. p a r a m e t e r s . A l a b e l is written [z_, x+] with real numbers 0 < z _ <__ z:~; the symbol oo m a y also be used for z _ and z + . A l a b e l l i n g L for X is a function from p a r a m e t e r s to labels. If L is under- stood, we write Xi - [z_, z+] for L(Xi) = [z_, z+]. A c o n s t r a i n t is a f o r m u l a over parameters in X in some accepted n o t a t i o n (e.g. X1 x X2 = )(3 or p _< - X I + X2 + )(3 <_ q). A c o n s t r a i n t s y s t e m C = (X, C, L / consists of a set X of parameters, a set C of constraints over X , and a labelling L for X . . S e m a n t i c s A v a l u a t i o n V for X is a function from the. p a r a m e t e r s to reals. T h e d e n o t a t i o n of a label [ z _ , z + ] is the set D ( [ z _ , z + ] ) = {z[z_ < z _< z + } if z+ # oo, D ( [ z _ , c o ] ) = {z]z_ _< z} if z _ # oo, D([oo, oo]) -- {oo) otherwise. A labelling L is in- terpreted as restricting the set of possible valua-. SThis is not because I object to the notion of truth measurement, but rather because I believe that the fuzzy logicians' assumption that the connectives of a logic of vagueness are truth functional is contradicted by the facts of human reasoning about vague concepts (as ar- gued by [Pinkal, to appear]). In my opinion, a formalism for truth measurement would have to be more like prob- ability theory.. TI assume the non-negative reals for simplicity, be- cause most of the physical properties mentioned in the examples have non-negative measurement scales. Even some of the exceptions, such as the common temperature scales, ate in fact equivalent to a scale of non-negative values.. tions for X to those V such t h a t for all Xi E X , if L(XI) = [ x _ , z + ] , t h e n V(X~) E D ( [ x _ , z + ] ) . T h u s we m a y view L as denoting a set of valuations on the parameters; we refer to this set as V(L).. A constraint C i denotes the largest set of valua- tions t h a t are consistent with the relation expressed by Cj; call this set V ( C j ) . . 3.2 C o n s t r a i n t P r o p a g a t i o n A l g o r i t h m s . T h e task of a constraint p r o p a g a t i o n a l g o r i t h m (CPA) is to tighten the interval labels in an a t t e m p t to either (1) find a labelling t h a t is j u s t tight enough to be consistent with the constraints and initial la- belling, or (2) signal inconsistency. C o n s t r a i n t prop- agation separates a stage of a s s i m i l a t i o n , during which intervals are tightened, from q u e r y i n g , dur- ing which the tightened values are reported. It is also possible to infer previously u n k n o w n relations between the p a r a m e t e r s in the querying stage by in- specting the tightened intervals. T h i s m e t h o d of rea- soning m a y be applied in the linguistic application under study, for example to derive the sentences in (2) above from (1).. A CPA is sound if V(Cl)n...VIV(Cn)nV(LI) C_ V(L) for every labelling L returned by the algorithm, where {el,...,Ca} is the set of constraints in the system and L1 is the initial labelling. It is c o m p l e t e if V(L) C V(Cl) n ... VI V(Cn) N V(L1) for every L t h a t it returns. In other words, the a l g o r i t h m is sound if it does n o t eliminate a n y values t h a t are consistent with the s t a r t i n g s t a t e of the system, a n d complete if it returns only such values.. As we will see, C P A ' s for intervals can only be complete under very restricted circumstances. T h u s Davis defines a weaker form of completeness for the assimilation process. A CPA is c o m p l e t e f o r as- s i m i l a t i o n if every labelling L t h a t it returns as-. [ z _ , x + ] such t h a t if Vi(Xi) e signs labels Xi - i i D([zi.., z~.]), t h e n l~ • Y(C1) n . . . N Y(Cn). T h a t is, the label assigned to each p a r a m e t e r accurately reflects the range of values it m a y a t t a i n given the constraints in the system.. T h e Waltz algorithm, which is s t a t e d below, is su- perior to m a n y other C P A ' s in these respects. It is a sound algorithm, unlike the Monte Carlo m e t h o d used by [Davis, 1986] and the hill-climber used by [McDermott and Davis, 1984]. Moreover, for con- s t r a i n t systems containing restricted types of con- straints, the Waltz a l g o r i t h m is complete for assim- ilation and t e r m i n a t e s very quickly. In contrast, Davis reports t h a t the h{ll-climbers used by [McDer- m o t t a n d Davis, 1984] were prohibitively slow and unreliable.. T h e a l g o r i t h m is based on an o p e r a t i o n called r e - f i n e m e n t , defined as follows. Given a constraint Cj, a p a r a m e t e r Xi appearing in C j , a n d labelling L de- fine:. R E F I N E ( Q , Xi, L) = {Y'(Xi)]Y' • V(L)rW(Cj)}. 352. Relation. Order O(pc) Unit Linear O(pS)* Inequality P r o d u c t O(pS) t. Time Complexity Completenessll Assimilation Incomplete. Incomplete. Complexity of Complete Solutions. O(p ~) As h a r d as. linear p r o g r a m m i n g N P - h a r d . Table 3: C o m p l e x i t y of the Waltz a l g o r i t h m for various systems of relations (from [Davis, 1987] and [Simmons, 1993]). p = number of parameters, c = number of constraints S = size of the system (the sum of the lengths of all of the constraints). * May not terminate if the system is inconsistent tTerminates in arbitrarily long (finite) time if the system is inconsistent. tMay not terminate if the solution is inadmissible (see text). T h i s is the set of values of Xi t h a t consistent with b o t h the labelling and the constraint.. T h e two r e f i n e m e n t o p e r a t o r s for a constraint Cj and p a r a m e t e r Xi are functions from labellings to labellings, written R - ( X i , C j ) and R+(Xi,Cj). I f L(Xi) = [x/_,x~], then R - ( X i , Q ) ( L ) i s formed by replacing x/__ in L with the lower b o u n d of R E F I N E ( C j , Xi, L), a n d R+(Xi, Cj)(L) is formed by replacing x~ in L With the upper b o u n d of R E F I N E ( C j , Xi, L). We say t h a t these refinements are b a s e d o n Cj. If the upper and lower bounds of R E F I N E are computable, t h e n refinement is by definition a sound operation.. For a constraint system C = (X, { C 1 , . . . , Ca}, L), L is q u i e s c e n t for a set o f refinement operators R = { R 1 , . . . , R , } if RI(L) = . . . = R,~(L) = L. T h e s o l u t i o n to C (if it exists) is the labelling L' denoting the largest set of valuations V(L') C_ V(L)N V ( C t ) N • .. f'l V(C,~) such t h a t L' is quiescent for any set of refinements based on the constraints in the system. If no such solution exists, then C is inconsistent.. T h e Waltz a l g o r i t h m repeatedly executes refine- m e n t s until the system is quiescent, and returns the solution (or signals inconsistency) if it t e r m i n a t e s (cf. [Davis, 1987, p. 286]). procedure WALTZ L *-- the initial labelling Q *-- a queue of all constraints w h i l e Q ~ @ d o . b e g i n remove constraint C from Q f o r each Xi appearing in C. i f R E F I N E ( X ~ , C, L) = t h e n r e t u r n I N C O N S I S T E N C Y . e l s e L *-- the result o f executing R - ( X i , C) and n+ ( x i , C) on L. f o r each Xi whose label was changed f o r each constraint C ' ~ C in which Xi appears. add C I to Q e n d Since refinement is a sound operation, the Waltz. algorithm is sound. T h e completeness, t e r m i n a t i o n and t i m e complexity of the a l g o r i t h m depends on. w h a t kinds of relations appear as constraints in the system, and on the order in which constraints are taken off the queue. T h e results for systems consist- ing exclusively of one of the three kinds of relations mentioned in (7)-(8) in section 2 are given in Table 3, under the a s s u m p t i o n t h a t constraints are selected in F I F O order or a fixed sequential order (other or- derings lead to worse results). T i m e complexity is measured as the n u m b e r of iterations t h r o u g h the m a i n loop of the algorithm. For comparison, Table 3 also gives the best known times for complete solu- tions to systems of such relations, s. In the linguistic application proposed here, the t e r m S in Table 3 (the s u m of the lengths of all of the constraints) is proportional to c (the number of constraints), since there are no more t h a n three pa- rameters in each constraint. Hence, O(pS) is O(pc) in this application.. Note t h a t Table 3 gives results for linear inequali- ties with u n i t coefficients (of the form p < ) ' ~ Xi - ~ j Xj < q, where no coefficients differ from 1 or - 1 ) . These are the only kind of linear inequalities under consideration in the linguistic application. In general, the Waltz a l g o r i t h m breaks down if the sys- t e m contains more complex relations, such as linear inequalities with a r b i t r a r y coefficients or p r o d u c t re- lations, since it m a y go into infinite loops even if the s t a r t i n g s t a t e of the s y s t e m was consistent. Con- sider, for example, the set of constraints { n l x X = Y, n2 x X = Y} with the s t a r t i n g labels n l - [1; 1], n 2 --" [2, 21, X - [ 0 , 1 0 0 ] and Y - [0,100]. T h e sys- t e m continually bisects the upper bounds of X and Y w i t h o u t ever being able to reach the solution, which. SHyvSnen's [HyvSnen, 1992] tolerance propagation (TP) approach is similar to t h e Waltz algorithm, but it uses a queue o f solution functions from interval arith- m e t i c [Alefeld and Herzberger, 1983] rather than refine- ment operations. The "global TP" method computes complete solutions, but at the price of increased com- plexity. In the "local" mode, tolerance propagation is very similar to the Waltz algorithm in its computational properties.. 353. is X - [0, 0] and Y -" [0, 0]. Similarly, if the s t a r t i n g labels are X - [1, ~ ] and Y - [1, c~], then the the lower bounds are continually doubled w i t h o u t reach- ing the solution X - leo, ~ ] and Y - [oo, oo].. However, it is shown in [Simmons, 1993] t h a t this happens only if the solution contains labels of this kind. Define a label as a d m i s s i b l e if it is not equal to [0, 0] or [0% oo]; otherwise, it is i n a d m i s s i b l e . A labelling L is admissible if it only assigns admissible labels; otherwise, L is inadmissible. Then it can be shown t h a t if a s y s t e m of product constraints is con- sistent and its solution is admissible, then the Waltz a l g o r i t h m t e r m i n a t e s in O(pS) time. Moreover, if the s y s t e m is inconsistent, the algorithm will find the inconsistency in finite b u t arbitrarily long time. Unfortunately, the proof is too long to include in the present paper, b u t a brief outline of the a r g u m e n t is given in the Appendix.. Systems with linear inequalities or product con- straints are liable to enter infinite or very long loops if the s t a r t i n g s t a t e is inconsistent (or if the solution is inadmissible in the case of products). Davis [1987, p. 305-306] suggests a strong heuristic for detecting and t e r m i n a t i n g such long loops: stop if we have been t h r o u g h the queue p times (for p parameters). He is not clear on w h a t he means by "having been t h r o u g h the queue z t i m e s " , b u t I interpret him as meaning t h a t we should stop if any constraint has been taken off the queue more often t h a n p times. T h e rationale is the observation t h a t in practice, most systems t h a t do t e r m i n a t e n o r m a l l y seem to do so before this con- dition is fulfilled, much sooner t h a n the worst-case t i m e predicted by the complexity analysis. T h e reli- ability of such a heuristic is one of the topics of the next subsection.. 3.3 Empirical Testing T h e a n a l y t i c results given in the previous subsection have left two i m p o r t a n t questions open:. • W h a t is the complexity of constraint propaga- tion if the system c o n t a i n s different kinds of con- straints?. • How reliable is Davis' heuristic for t e r m i n a t i n g infinite (or very long) loops?. T h e first question lends itself to an analytic an- swer, b u t the results are n o t known at present. B u t we can seek empirical evidence by running the al- g o r i t h m on mixed systems of constraints to see if the t i m e to t e r m i n a t i o n is significantlY greater t h a n the complexity expected for systems containing j u s t the m o s t complex t y p e of relation in the system. If this does not h a p p e n for a n u m b e r of representa- tive systems, we m a y conjecture t h a t the combina- tion of constraints has not m a d e the problem more complex. T h e second question can only be answered empirically, by testing whether the heuristic tends to t e r m i n a t e the a l g o r i t h m too soon (i.e. whether it t e r m i n a t e s refinement of systems t h a t m i g h t have t e r m i n a t e d n o r m a l l y in a short time).. Empirical investigations of these questions are re- ported in [Simmons, 1993], and described briefly here. To investigate the first question, the a l g o r i t h m was run on a n u m b e r of large, consistent constraint systems with admissible solutions in which the three types of constraints shown in Table 3 appeared in a p p r o x i m a t e l y equal numbers. On each run, the con- straints in the initial queue were p e r m u t e d r a n d o m l y to suppress the possible effects of ordering. None of these runs required more t i m e to t e r m i n a t i o n t h a n is predicted by the O(pS) result for systems con- taining j u s t u n i t linear inequalities or j u s t p r o d u c t constraints.. To investigate the second question, I a t t e m p t e d to build consistent constraint systems with admissi- ble solutions t h a t are t e r m i n a t e d by Davis' heuris- tic sooner t h a n t h e y would have been normally. It turns out t h a t the a l g o r i t h m runs to completion on almost all systems t h a t were tested long before any constraint is t a k e n off the queue p times, a l t h o u g h there are systems for which refinement is t e r m i n a t e d too soon on this heuristic. If the limit is increased by a constant factor, e.g. if assimilation is stopped after some constraint is processed 2p times, t h e n the risk of early t e r m i n a t i o n is greatly reduced.. In all, the empirical results on the open questions m e n t i o n e d above have been encouraging. It is an a d m i t t e d weakness of these tests, however, t h a t t h e y were performed on systems built by h a n d , n o t on constraint systems t h a t occur " n a t u r a l l y " as p a r t of an NL interface to a K R system.. 4 C o n c l u s i o n s . T h e results of the previous section yield Tables 4 and 5 as the complexity of reasoning with the Waltz a l g o r i t h m under the non-compositional a n d compo- sitional approaches, respectively. These results de- pend in part on the fact t h a t there is a m a x i m u m n u m b e r of p a r a m e t e r s in each constraint in the lin- guistic application. Measurements are modelled as predicate constraints, i.e. t h e y simply impose inter- val b o u n d s on some p a r a m e t e r . Intervals are also assumed to model the range of m e a s u r e m e n t values for the physical p r o p e r t y t h a t is typical for members of a category (e.g. the typical w i d t h of refridgera- tots), thus accounting for the n o r m used in the in- t e r p r e t a t i o n of positives. An i m p o r t a n t p r o p e r t y of such " n o r m intervals" is t h a t t h e y m a y n o t be re- fined, at least n o t t o o much. T h i s m a y be achieved by adding constraints imposing absolute upper a n d lower b o u n d s on their ranges (cf. [Simmons, 1992]).. A l t h o u g h the worst-case t i m e complexity in all cases t u r n s o u t to be the same, the compositional approach is more complex for two reasons. First, the system is prone to enter infinite loops under the compositional approach if the s t a r t i n g s t a t e is incon- sistent, or if the solution is inadmissible. Consistency cannot generally be g u a r a n t e e d in the linguistic ap- plication under consideration, since the sentences in. 354. Non-compositional I Morphosyntactic Relation. Category II Measurements. Positive Comparative. Equative Differential. comparative Equative w / factor term. Predicate Order Order Order Linear. Inequality Product. Time Complexity. trivial OIpc}. pc OIpc O~ O(pe),. o(pc)t. Completeness. Complete Assimilation Assimilation Assimilation Incomplete. Incomplete. Table 4: Complexity of reasoning under the non-compositional approach. I Morphosyntactic Category II. Measurements Positive. Comparative Equative. Differential comparative Equative w / factor term. Compositional Relation. Predicate Linear Inequality Linear Inequality.. Product Linear. Inequality Product. Time Complexity. trivial O(pc), O(pc), O(pc)t O(pc),. O(pc) t. Completeness. Complete Incomplete Incomplete Incomplete Incomplete. Incomplete. Table 5: Complexity of reasoning under the compositional approach p = number of parameters, c = number of constraints. • May not terminate if the starting state is inconsistent tTerminates in arbitrarily long (finite) time if the system is inconsistent. fMay not terminate if the solution is inadmissible. a text m a y contain errors. Second, reasoning under the compositional approach is incomplete in all b u t the trivial case of measurements, whereas the non- compositional approach guarantees at least assimi- lation completeness for a subset of the parameters in the system. This means that under the compo- sitional approach, the reasoner does not refine some intervals as tightly as it could have under the non- compositional approach.. These results may be taken as grounds for reject- ing the compositional approach to the semantics of dimensional adjectives in the design of an NL in- terface to a K R system for quantitative knowledge. However, I do not believe that the compositional ap- proach is contraindicated for all conceivable systems. In addition to the general theoretical appeal of com- positional semantics, the compositional formation of meaning representations may be computationally more attractive in some cases (e.g. in unification- based formalisms). Thus if the non-compositional formation of semantic representations turns out to be too expensive, it may defeat the computational advantage gained in the reasoning process.. This is especially true if the weaknesses of the com- positional approach do not turn out to be highly relevant in the specific application. For example, if the domain of physical properties being represented is such that a set of constraints requiring some pa- rameter to be set to [0, 0] or [c~, co] is unlikely to be encountered, and hence the solution is likely to be admissible, then the risk of infinite loops is re- duced. Moreover, if Davis' heuristic for terminating infinite loops turns out to be reliable (which might be determinable by experimentation within the spe- cific application), then inconsistencies need not be very damaging.. The incompleteness of reasoning under the com- positional approach is unacceptable for an applica- tion if it is crucial that the inferred intervals con- tain precisely those values that are warranted by the constraints and the initial labelling. If a superset of those values can be accepted, however, then the com- positional approach can be taken. Both approaches suffer a lack of what Davis calls q u e r y c o m p l e t e - n e s s : if the value of a term T is to be determined during the querying stage (i.e. after assimilation),. 355. the s y s t e m m a y r e t u r n a superset o f the values for T t h a t are w a r r a n t e d by the constraints.. T h u s an engineer building an NL interface to a s y s t e m for reasoning a b o u t uncertain q u a n t i t a t i v e knowledge of physical properties must make a n u m - ber of design decisions:. • How i m p o r t a n t are difference and facto r t e r m s in the linguistic m a t e r i a l to be processed?. If difference and f a c t o r t e r m s are so m a r g i n a l t h a t t h e y m a y n o t occur at all, t h e n t h e non- c o m p o s i t i o n a l a p p r o a c h is p r o b a b l y the b e t t e r choice, due t o its g u a r a n t e e of t e r m i n a t i o n an d as- similation completeness.. • Does the c o m p o s i t i o n a l generation of lexical se- m a n t i c r e p r e s e n t a t i o n s have a significant advan- tage ( c o m p u t a t i o n a l or otherwise) over t h e non- compositional a p p r o a c h ? . • Is it possible or likely for the m e a s u r e m e n t o f some physical p r o p e r t y to be exactly zero?. While there is p r o b a b l y no n a t u r a l applicat i o n in which the m a g n i t u d e of some p r o p e r t y can be in- finitely large, there are different philosophies a b o u t the t r e a t m e n t of zero. In a s y s t e m of t e m p o r a l rea- soning, for example, saying t h a t some event has zero d u r a t i o n m a y be a way of saying t h a t the event does not exist. B u t a n o t h e r policy m i g h t be to insist t h a t no physical p r o p e r t y is represented if it is not exhib- ited to a positive degree. If this a s s u m p t i o n can b e made, t h e n the intervals [0, 0] and [c¢, oo] are t r u l y inadmissible, and hence one weakness o f the co m p o - sitional a p p r o a c h is diminished.. • Is it i m p o r t a n t t h a t the precise range o f permis- sible m e a s u r e m e n t values be inferred for each p a r a m e t e r , or can a superset of those values be useful?. I f a superset of t h e possible values is acceptable, t h e n the c o m p o s i t i o n a l a p p r o a c h can be chosen. O t h er- wise, the n o n - c o m p o s i t i o n a l a p p r o a c h m u s t b e taken.. By weighing t h e various answers to these ques- tions, an engineer can stake o u t a position on the t r a d e o f f and design a s y s t e m with the power and ef- ficiency m o s t a p p r o p r i a t e to his or her needs.. A c k n o w l e d g e m e n t s . T h a n k s to C a r o l a Eschenbach, C l a u d i a Maienborn, A n d r e a Schopp, Heike T a p p e and the referees for their c o m m e n t s on earlier versions of this paper. T h a n k s also t o Longin Latecki for discussions a b o u t c o n s t r a i n t p r o p a g a t i o n , and to C h r i s t o p h e r Habel for encouraging me to pursue this work.. A p p e n d i x . In t h e following, the p r o o f of t h e following t h e o r e m ( f r o m [Simmons, 1993]) is briefly outlined:. T h e o r e m 1 If a system of product constraints is consistent and its solution is admissible, then the Waltz algorithm brings it to quiesenee in time O(pS).. Recall t h a t a p r o d u c t c o n s t r a i n t is o f t h e f o r m ~ i Xi = Y , a n d t h a t a labelling is admissible if it does n o t assign [0, 0] or [c~, oo] t o a n y p a r a m e t e r . First we need some t e r m i n o l o g y defined in [Davis, 1987, A p p e n d i x B] (recall t h e definition o f refine- m e n t o p e r a t o r s in section 3.2 above).. For a refinement o p e r a t o r R, let OUT(R) be the b o u n d affected by R, an d let A R G S ( R ) be t h e set o f b o u n d s o t h e r t h a n OUT(R) t h a t en t er i n to t h e c o m p u t a t i o n o f OUT(R). Given a labelling L, R is a c t i v e on L if it changes L, i.e. if L ~ R(L).. A series o f refinement o p e r a t o r s T~ = ( R I , . . . , Rm) is active if each refinement in T~ is active. We say t h a t Ri is an i m m e d i a t e p r e d e c e s s o r o f Rj in 7~ if i < j, OUT(Ri) E A R G S ( R j ) , an d for all k such t h a t i < k < j, OUT(Rk) # OUT(I~). In o t h e r words, some a r g u m e n t o f P~ has b een set m o s t re- cently in t h e series b y R j . We say t h a t Ri d e p e n d s o n Rj if ei t h er i = j or Ri d ep en d s on Rk and Rj is an i m m e d i a t e predecessor o f Rk. T h u s t h e depen- dence relation is t h e t r a n s i t i v e a n d reflexive closure o f t h e i m m e d i a t e precedence relation. We say t h a t Ri d ep en d s on b o u n d B if for so m e R j , Ri d epends on Rj an d B E A R G S ( R j ) . . T h e series o f refinements T~ = ( R 1 , . . . , R~) is s e l f - d e p e n d e n t if Rn depends on OUT(Rn), its own out- p u t b o u n d . In o t h e r words, a series is self-dependent if t h e last b o u n d affected by t h e series is also an argu- m e n t t o t h e first refi n em en t in a chain o f refinements in t h e precedence relation, as i l l u s t r a t e d below.. (OUT( Rn ~OUT( R, }~-~OUT( R2 } . . . ~. Davis shows that such self-dependencies are po- tential infinite loops:. T h e o r e m 2 Any infinite sequence of active refine- ments contains an active, self.dependent subsequence ([Davis, 1987, Lemma B.15]}.. In [Simmons, 1993], it is shown t h a t if an y self- d e p e n d e n t sequence 7~ is active on t h e labelling of a sy st em o f p r o d u c t constraints, t h e n a cert ai n sub- sequence T~' o f ~ will b e active infinitely m a n y times. Moreover, on t h e rn-th e x e c u t i o n o f each re- finement Ri in ~ ' , t h e r e is a t e r m 7~n/, where each T/m > T/m-1 > 1, such t h a t OUT(Ri) is m u l tiplied by:. (T~) -1, if O U T ( e , ) is an u p p e r b o u n d sty-, if OUT(R~) is a lower b o u n d . It follows t h a t u p p e r b o u n d s are refined so as to b e c o m e a r b i t r a r i l y small ( a s y m p t o t i c a l l y a p p r o a c h - ing zero), an d t h a t lower b o u n d s b e c o m e a r b i t r a r i l y large, u p t o infinity.. T h u s if t h e r e is an y c o n s t r a i n t Ci in t h e s y s t e m t h a t imposes a lowest value g r e a t e r t h a n zero o n an. 356. upper bound that is affected by a refinement oper- ator in ~ ' , that bound will be refined often enough until it becomes inconsistent with Ci. Similarly, if any constraint Cu imposes a largest finite value on a lower bound that is affected by a refinement in 7U, then that bound will be refined until it becomes inconsistent with Cu. In both cases, the system is inconsistent.. If there are no such constraints, then it is consis- tent for upper bounds affected by T~' to be asymp- totically close to zero and for lower bounds affected by T~' to be arbitrarily large. This can only be con- sistent if, in the case of upper bounds, the solution assigns [0, 0] to the parameter in question, and in the case of lower bounds, the solution assigns [co, oo] to its parameter. Hence, the solution is inadmissible.. But according to Davis' result (Theorem 2), in- finite loops must contain an active, self-dependent subsequence such as 7~. It follows that if a system of product constraints is consistent and its solution is admissible, then the Waltz algorithm finds its so- lution in finite time. The time complexity result is a straightforward extension of Davis' analysis of unit linear inequalities (see [Simmons, 1993]).. R e f e r e n c e s . [Alefeld and Herzberger, 1983] G. Alefeld, J.Herzberger. Introduction to Inter- val Computations. Reading, MA: Addison-Wesley. [Allen and Kautz, 1985] J. F. Allen, H. A. Kautz. A Model of Naive Temporal Reasoning. In: J.R. Hobbs, R.C. Moore (ed.): Formal Theories of the Commonsense World. Norwood, N J: Ablex. 251- 268. [Bierwisch, 1989] M. Bierwisch. The Semantics of Gradation. In: M. Bierwisch, E. Lang (eds.): Dimensional Adjectives. Berlin et al.: Springer- Verlag. 71-261. [Bobrow, 1985] D. Sobrow (ed.). Qualitative Rea- soning about Physical Systems. Cambridge, MA: MIT Press. Reprinted from: Artifical Intelligence 24, 1984. [Brooks, 1981] R. Brooks. Symbolic Reasoning among 3-D Models and 2-D hnages. Artifical In- telligence 17, 285-348. [Cresswell, 1976] M . J . Cresswell. The Semantics of Degree. In: B.H. Partee (ed.): Montague Gram- mar. New York: Academic Press. 261-292. [Davis, 1986] E. Davis. Representing and Acquiring Geographic Knowledge. London: Pitman. [Davis, 1987] E. Davis. Constraint Propagation with Interval Labels. Artificial Intelligence 32,281-332. [Dean, 1987] T. Dean. Large-Scale Temporal Data Bases for Planning in Complex Domains. In: Pro- ceedings of the IJCAI-87. 860-866. [Hellan, 1981] L. Hellan. Towards an Integrated Analysis of Comparatives. Tuebingen: Narr. [Hoeksema, 1983] J. Hoeksema. Negative Polarity and the Comparative. Natural Language and Lin- guistic Theory 1,403-434. [Hyv6nen, 1992] E. Hyv6nen. Constraint reasoning based on interval arithmetic. Artificial Intelligence 58, 71-112. [Kamp, 1975] J. A. W. Kamp. Two Theories about Adjectives. In: E. L. Keenan (ed.): Formal Seman- tics of Natural Language. Cambridge: Cambridge Univ. Press. 123-155. [Klein, 1980] E. Klein. A semantics for positive and comparative adjectives. Linguistics and Philoso- phy 4, 1-45. [Klein, 1991] E. Klein. Comparatives. In: A. yon Stechow, D. Wunderlich (eds.): Semantics. Berlin: de Gruyter. 673-691. [Krantz et al., 1971] D. H. Krantz, R. D. Luce, P. Suppes, A. Tversky. Foundations of Measurement. New York, London: Academic Press. [Lyons, 1977] J. Lyons. Semantics. Vol. 1. Cam- bridge et al.: Cambridge Univ. Press. [McDermott and Davis, 1984] D. McDermott, E. Davis. Planning Routes Through Uncertain Ter- ritory. Artificial Intelligence 22, 107-156. [Pinkal, 1990] M. Pinkal. On the Logical Structure of Comparatives. In: R. Studer (ed.): Natural Language and Logic. Berlin: Springer. 146-167. [Pinkal, to appear] M. Pinkal. Logic and Lexicon. On the Semantics of the Indefinite. Dordrecht: Kluwer. Translation by G. Simmons of M. Pinkal (1985): Logik und Lexikon. Berlin: de Gruyter. [Sapir, 1944] E. Sapir. Grading: A Study in Seman- tics. Philosophy of Science 11, 93-116. Reprinted in: D. G. Mandelbaum (ed.)(1968): Selected Writ- ings of Edward Sapir. Berkeley, Los Angeles: U. Calif. Press. [Simmons, 1992] G. Simmons. Standardwissen fiber Normen: Zur konzeptuellen Analyse yon Objek- ten. Master's thesis. Universit~t Hamburg.. [Simmons, 1993] G. Simmons. Notes on Product Constraints. Report 22, Graduiertenkolleg Kog- nitionswissenschaft. Universit/it Hamburg. [yon Stechow, 1984] A. von Stechow. Comparing Se- mantic Theories of Comparison. Journal of Se- mantics 3, 1-77. [Waltz, 1975] D. Waltz. Understanding line draw- ings of scenes with shadows. In: P.H. Win- ston (ed.), The Psychology of Computer Vision. McGraw-Hill, New York. 19-91. [Weld and deKleer, 1990] D.S. Weld, J. deKleer (eds.). Qualitative Reasoning about Physical Sys- tems. San Mateo, CA: Morgan Kaufman. 357

New documents

We explore six challenges for neural machine translation: domain mismatch, amount of training data, rare words, long sentences, word alignment, and beam search.. We show both

the subspace of image space in which the images generated by plenoptic function.. for an object have their

Our recommended techniques include: 1 a training approach using Adam with multiple restarts and learning rate annealing, 2 sub-word translation via byte pair encoding, and 3 decoding

This new algorithm called GAWES Genetic Algorithm With Early Stopping reduces the level of overfitting and yields feature subsets that have a better generalization accuracy... The

Model performance As shown in the table, we can improve the translation performance by 0.5 BLEU point by using the Ensemble system to rescore n-best list generated by the same system,

In this evaluation two methods for producing ensemble members are used, one selects feature subsets at random after Ho, 1998, the other uses a search approach.. This evaluation

In this paper, we investigate the adequate vision span of attention mod- els in the context of machine translation, by proposing a novel attention framework that is capable of reducing

Fluorochrome, Rhodamine, was used to label the nano lipid to observe its uptaking ratio and targeting property by confocal microscopy; CCK-8 method was used to detect the effect of

binding of a family of cyclophilins and cyclophilin-like proteins of the human malarial parasite Plasmodium.. This is a PDF file of an unedited manuscript that has been accepted for

Before any analysis of the differences in micropat- tern occurrence between users with mental health conditions and their matched controls, we demon- strate that these micropatterns are