Representation
Roger Evans*
University of Brighton
Gerald Gazdar t
University of Sussex
Much recent research on the design of natural language lexicons has made use of nonmonotonic
inheritance networks as originally developed for general knowledge representation purposes in
Artificial Intelligence.
DATRis a simple, spartan language for defining nonmonotonic inher-
itance networks with path~value equations, one that has been designed specifically for lexical
knowledge representation. In keeping with its intendedly minimalist character, it lacks many
of the constructs embodied either in general-purpose knowledge representation languages or in
contemporary grammar formalisms. The present paper shows that the language is nonetheless
sufficiently expressive to represent concisely the structure of lexical information at a variety of
levels of linguistic analysis. The paper provides an informal example-based introduction to
DATRand to techniques for its use, including finite-state transduction, the encoding of DA Gs and lexical
rules, and the representation of ambiguity and alternation. Sample analyses of phenomena such
as inflectional syncretism and verbal subcategorization are given that show how the language can
be used to squeeze out redundancy from lexical descriptions.
1. I n t r o d u c t i o n
Irregular lexemes are standardly regular in some respect. Most are just like regular lexemes except that they deviate in one or two characteristics. What is needed is a natural w a y of saying "this lexeme is regular except for this property." One obvious approach is to use nonmonotonicity and inheritance machinery to capture such lexical irregularity (and subregularity), and much recent research into the design of represen- tation languages for natural language lexicons has thus made use of nonmonotonic inheritance networks (or "semantic nets") as originally developed for more general representation purposes in Artificial Intelligence. Daelemans, De Smedt, and Gazdar (1992) provide a rationale for, and an introduction to, this b o d y of research and we will not rehearse the content of that paper here, nor review the work cited there. 1 DATR is a rather spartan nonmonotonic language for defining inheritance networks with p a t h / v a l u e equations. In keeping with its intendedly minimalist character, it lacks many of the constructs embodied either in general-purpose knowledge representation languages or in contemporary grammar formalisms. But the present paper seeks to
* Information Technology Research Institute, University of Brighton, Brighton BN2 4AT, U.K. E-mail: [email protected]
t Cognitive & Computing Sciences, University of Sussex, Brighton BN1 9QH, U.K. E-mail: [email protected]
1 Daelemans and Gazdar (1992) and Briscoe, de Paiva, and Copestake (1993) are collections that bring
Computational Linguistics Volume 22, Number 2
show that the language is nonetheless sufficiently expressive to represent concisely the structure of lexical information at a variety of levels of language description.
The d e v e l o p m e n t of DATR has been g u i d e d b y a n u m b e r of concerns, which w e s u m m a r i z e here. Our objective has been a language that (i) has an explicit theory of inference, (ii) has an explicit declarative semantics, (iii) can be readily a n d efficiently implemented, (iv) has the necessary expressive p o w e r to encode the lexical entries p r e s u p p o s e d b y w o r k in the unification g r a m m a r tradition, a n d (v) can express all the evident generalizations a n d subgeneralizations about such entries. Our first publica- tions on DATR (Evans a n d G a z d a r 1989a, 1989b) p r o v i d e d a formal theory of inference (i) a n d a formal semantics (ii) for DATR a n d we will n o t recapitulate that material here. 2 With respect to (iii), the core inference engine for DATR can be coded in a page of Prolog (see, e.g., Gibbon 1993, 50). At the time of writing, we k n o w of a d o z e n different implementations of the language, some of which have been used w i t h large DATR lexicons in the context of big NLP systems (e.g., A n d r y et al. 1992; Cahill 1993a, 1994; Cahill a n d Evans 1990). We will c o m m e n t further on implementation matters in Section 5, below. However, the m a i n purpose of the present paper is to exhibit the use of DATR for lexical description (iv) a n d the w a y it makes it relatively easy to capture lexical generalizations a n d subregularities at a variety of analytic levels (v). We will pursue (iv) a n d (v) in the context of an informal example-based introduction to the language a n d to techniques for its use, a n d w e will m a k e frequent reference to the DATR-based lexical w o r k that has been d o n e since 1989.
The paper is organized as follows: Section 2 uses an analysis of English verbal m o r p h o l o g y to provide an informal introduction to DATR. Section 3 describes the language more precisely: its syntax, inferential a n d default mechanisms, a n d the use of abbreviatory variables. Section 4 describes a w i d e variety of DATR techniques, in- cluding case constructs a n d parameters, Boolean logic, finite-state transduction, lists a n d DAGs, lexical rules, a n d w a y s to encode ambiguity a n d alternation. Section 5 explores more technical issues relating to the language, including functionality a n d consistency, multiple-inheritance, m o d e s of use, a n d existing implementations. Sec- tion 6 makes some closing observations. Finally, an appendix to the paper replies to the points m a d e in the critical literature on DATR.
2. DATR
by Example
We begin our presentation of DATR with a partial analysis of m o r p h o l o g y in the English verbal system. In DATR, information is organized as a n e t w o r k of nodes, where a n o d e is essentially just a collection of closely related information. In the context of lexical description, a n o d e typically corresponds to a word, a lexeme, or a class of lexemes. For example, we m i g h t have a n o d e describing an abstract verb,
another for the subcase of a transitive verb, another for the lexeme
love,
a n d still morefor the individual w o r d s that are instances of this lexeme
(love, loves, loved, loving,
etc.).Each n o d e has associated w i t h it a set of p a t h / v a l u e pairs, where a p a t h is a sequence of
atoms
(which are primitive objects), a n d a value is an a t o m or a sequence of atoms. We will sometimes refer to atoms in paths as attributes.For example, a n o d e describing the present participle form of the verb
love
(andcalled perhaps Wordl) m i g h t contain the p a t h / v a l u e pairs s h o w n in Table 1. The paths
Table 1
Path/value pairs for present participle oflove.
Path Value
syn cat verb syn type main
syn form present participle mor form love ing
in this e x a m p l e all h a p p e n to contain t w o attributes, a n d the first attribute can be t h o u g h t of as d i s t i n g u i s h i n g syntactic a n d m o r p h o l o g i c a l t y p e s of information. The v a l u e s indicate a p p r o p r i a t e linguistic settings for the p a t h s for a p r e s e n t participle f o r m of love. Thus, its syntactic c a t e g o r y is v e r b , its syntactic t y p e is main (i.e., it is a m a i n verb, n o t a n auxiliary), its syntactic f o r m is p r e s e n t p a r t i c i p l e (a t w o - a t o m sequence), its m o r p h o l o g i c a l f o r m is l o v e i n g (another t w o - a t o m sequence). In DATR this can b e w r i t t e n as: 3
W o r d l :
< s y n c a t > = v e r b < s y n t y p e > = m a i n
< s y n f o r m > = p r e s e n t p a r t i c i p l e < m o r f o r m > = l o v e ing.
Here, angle b r a c k e t s (<...>) delimit paths. N o t e that v a l u e s can be a t o m i c or t h e y can consist of s e q u e n c e s of a t o m s , as the t w o last lines of the e x a m p l e illustrate# As a first a p p r o x i m a t i o n , n o d e s c a n b e t h o u g h t of as d e n o t i n g partial functions f r o m p a t h s (sequences of a t o m s ) to v a l u e s (sequences of atoms), s
In itself, this tiny f r a g m e n t of DATR is n o t p e r s u a s i v e , a p p a r e n t l y allowing only for the specification of w o r d s b y s i m p l e listing of p a t h / v a l u e s t a t e m e n t s for each one. It s e e m s that if w e w i s h e d to describe the p a s s i v e f o r m of love w e w o u l d h a v e to write:
Word2:
< s y n c a t > = v e r b < s y n t y p e > = m a i n
< s y n f o r m > = p a s s i v e p a r t i c i p l e < m o r f o r m > = l o v e ed.
This does n o t s e e m v e r y helpful: the w h o l e p o i n t of a lexical d e s c r i p t i o n l a n g u a g e is to c a p t u r e generalizations a n d a v o i d the k i n d of d u p l i c a t i o n e v i d e n t in the specification of Word1 a n d Word2. A n d indeed, w e shall shortly i n t r o d u c e a n inheritance m e c h a n i s m that allows us to do just that. But there is one sense in w h i c h this listing a p p r o a c h
3 The syntax of DATR, like its name and its minimalist philosophy, owes more than a little to that of the unification grammar language PATR (Shieber 1986). With hindsight this may have been a bad design decision since similarity of syntax tends to imply a similarity of semantics. And, as we shall see in Section 4.7 below, and elsewhere, there is a subtle but important semantic difference.
4 Node names and atoms are distinct, but essentially arbitrary, classes of tokens in DATR. In this paper we shall distinguish them by a simple case convention--node names start with an uppercase letter, atoms do not.
[image:3.468.153.290.97.147.2]Computational Linguistics Volume 22, Number 2
is exactly what we want: it represents the actual information we generally wish to access from the description. So in a sense, we do want all the above statements to
be present in our description; what we want to avoid is repeated
specification
of thecommon elements.
This problem is overcome in DATR in the following way: such exhaustively listed
p a t h / v a l u e statements are indeed present in a description, but typically only
implic-
itly
present. Their presence is a logical consequence of a second set of statements, which have the concise, generalization-capturing properties we expect. To make the distinction sharp, we call the first type of statement extensional and the second type definitional. Syntactically, the distinction is made with the equality operator: for ex- tensional statements (as above), we use -, while for definitional statements we use ---=. And, although our first example of DATR consisted entirely of extensional statements, almost all the remaining examples will be definitional. The semantics of the DATR language binds the two together in a declarative fashion, allowing us to concentrate on concise definitions of the network structure from which the extensional "results" can be read off.Our first step towards a more concise account of Wordl and Word2 is simply to change the extensional statements to definitional ones:
W o r d l :
< s y n c a t > == v e r b < s y n t y p e > == m a i n
< s y n f o r m > == p r e s e n t p a r t i c i p l e < m o r f o r m > == l o v e ing.
W o r d 2 :
< s y n c a t > == v e r b < s y n t y p e > == m a i n
< s y n f o r m > == p a s s i v e p a r t i c i p l e < m o r f o r m > == l o v e ed.
This is possible because DATR respects the unsurprising condition that if at some node a value is specifically defined for a path with a definitional statement, then the corresponding extensional statement also holds. So the statements we previously made
concerning Wordl and Word2 remain true, but now only
implicitly
true.Although this change does not itself make the description more concise, it allows us to introduce other ways of describing values in definitional statements, in addition to simply specifying them. Such value descriptors will include inheritance specifica- tions that allow us to gather together the properties that Wordl and Word2 have solely by virtue of being verbs. We start by introducing a VERB node:
VERB:
< s y n c a t > == v e r b < s y n t y p e > == main.
and then redefine Wordl and Word2 to inherit their verb properties from it. A direct encoding for this is as follows:
W o r d 1 :
< m o r f o r m > W o r d 2 :
< s y n c a t > < s y n t y p e > < s y n f o r m > < m o r f o r m >
= = l o v e i n g .
= = V E R B : < s y n c a t > = = V E R B : < s y n t y p e > = = p a s s i v e p a r t i c i p l e = = l o v e ed.
In these revised definitions the right-hand side of t h e < s y n c a t > s t a t e m e n t is not a direct v a l u e specification, b u t instead an inheritance descriptor. This is the simplest f o r m of DATR inheritance: it just specifies a n e w n o d e a n d p a t h from w h i c h to obtain the required value. It can be glossed r o u g h l y as "the v a l u e associated w i t h < s y n c a t > at Wordl is the same as the value associated with < s y n c a t > at VERB." Thus from V E R B : < s y n c a t > = = v e r b it n o w follows that W o r d l : < s y n c a t > = = verb. 6
H o w e v e r , this modification to o u r analysis seems to m a k e it less concise, rather than more. It can be i m p r o v e d in two ways. The first is really just a syntactic trick: if the p a t h o n the right-hand side is the same as the p a t h on the left-hand side, it can be omitted. So w e can replace VERB : < s y n t y p e > , in the e x a m p l e above, with just VERB. We can also e x t e n d this abbreviation strategy to cover cases like the following, w h e r e the p a t h on the right-hand side is different b u t the n o d e is the same:
C o m e :
< m o r r o o t > = = c o m e
< m o r p a s t p a r t i c i p l e > = = C o m e : < m o r r o o t > .
In this case w e can simply omit the node:
C o m e :
< m o r r o o t > = = c o m e
< m o r p a s t p a r t i c i p l e > = = < m o r r o o t > .
The other i m p r o v e m e n t introduces one of the m o s t i m p o r t a n t features of DATR--
specification b y default. Recall that paths are sequences of attributes. If w e u n d e r s t a n d
paths to start at their left-hand end, w e can construct a notion of p a t h extension: a
p a t h P2 extends a p a t h P1 if a n d only if all the attributes of P1 occur in the same
order at the left-hand e n d of P2 (so < a l a2 a3> extends < > , < a l > , < a l a2>, a n d < a l a2 a3>, b u t not < a 2 > , < a l a3>, etc.). If w e n o w consider the (finite) set of p a t h s occurring in definitional statements associated w i t h some n o d e , that set will not include all possible p a t h s (of w h i c h there are infinitely many). So the question arises of w h a t w e can say a b o u t paths for w h i c h there is n o specific definition. For some p a t h
P1 not defined at n o d e N, there are two cases to consider: either P1 is the extension
of some p a t h defined at N or it is not. The latter case is e a s i e s t - - t h e r e is s i m p l y no
definition for P1 at N (hence N can be a partial function, as already n o t e d above). But
in the f o r m e r case, w h e r e P1 extends some P2 w h i c h is defined at N, P1 assumes a
definition " b y default." If P2 is the only p a t h defined at N w h i c h P1 extends, then P1
takes its definition from the definition of P2. If P1 extends several p a t h s defined at
N, it takes its definition from the most specific (i.e., the longest) of the p a t h s that it extends.
In the p r e s e n t example, this m o d e of default specification can be a p p l i e d as follows:
Computational Linguistics Volume 22, Number 2
We have two statements at Wordl that (after applying the abbreviation introduced above) both inherit from VERB:
Word1:
<syn cat> == VERB <syn type> == VERB.
Because they have a common leading subpath < s y n > , we can collapse them into a single statement about < s y n > alone:
Wordl:
<syn> == VERB.
If this were the entire definition of Wordl, the default mechanism would ensure that all extensions of < s y n > (including the two that concern us here) would be given the same definition--inheritance from VERB. But in our example, of course, there are other statements concerning Word1. If we add these back in, the complete definition looks like this:
Wordl:
<syn> == VERB
<syn form> == present participle <mor form> == love ing.
The paths < s y n type> and < s y n cat> (and also m a n y others, such as < s y n cat foo>, < s y n baz>) obtain their definitions from < s y n > using the default m e c h a n i s m just introduced, and so inherit from VERB. The path < s y n form>, being explicitly de-
fined, is exempt from this default behavior, and so retains its value definition, present participle; any extensions of < s y n form> obtain their definitions from < s y n form> rather than < s y n > (since it is a m o r e specific leading subpath), and so will have the value present participle also.
The net effect of this definition for Wordl can be glossed as "Wordl stipulates its morphological form to be love ing and inherits values for its syntactic features from VERB, except for < s y n form>, which is present participle." M o r e generally, this m e c h a n i s m allows us to define nodes differentially: b y inheritance from default spec- ifications, augmented by any nondefault settings associated with the node at hand. In fact, the Wordl example can take this default inheritance one step further, by inheriting everything (not just < s y n > ) from VERB, except for the specifically mentioned values:
Wordl:
<> == VERB
<syn form> == present participle <mor form> == love ing.
Here the empty path < > is a leading subpath of every path, and so acts as a "catch all"--any path for which no more specific definition at Word1 exists will inherit from VERB. Inheritance via the empty path is ubiquitous in real DATR lexicons but it should be remembered that the empty path has no special formal status in the language.
In this way, Word1 and Word2 can both inherit their general verbal properties from VERB. Of course, these two particular forms have more in common than simply being
lexeme, w e can p r o v i d e a site for properties shared b y all f o r m s of
love
(in this simple example, just its m o r p h o l o g i c a l root a n d the fact that it is a verb).V E R B :
< s y n c a t > = = v e r b < s y n t y p e > = = m a i n . L o v e :
<> = = V E R B
< m o r r o o t > = = love. W o r d l :
<> = = L o v e < s y n f o r m > < m o r f o r m > W o r d 2 :
<> = = L o v e < s y n f o r m > < m o r f o r m >
= = p r e s e n t p a r t i c i p l e = = < m o r r o o t > ing.
= = p a s s i v e p a r t i c i p l e = = < m o r r o o t > ed.
So n o w Wordl irrherits from Love rather than VERB (but Love inherits from VERB, so the latter's definitions are still p r e s e n t at Word1). H o w e v e r , instead of explicitly including the a t o m l o v e in the m o r p h o l o g i c a l form, the value definition includes the descriptor <mor r o o t > . This descriptor is equivalent to W o r d l : < m o r r o o t > and, since <mor r o o t > is not defined at Wordl, the e m p t y p a t h definition applies, causing it to irtherit from Love : <mor r o o t > , a n d t h e r e b y r e t u r n the e x p e c t e d value, l o v e . Notice here that each e l e m e n t of a value can be defined entirely i n d e p e n d e n t l y of the others; for <mor f o r m > w e n o w h a v e an irfl~eritance descriptor for the first e l e m e n t a n d a simple v a l u e for the second.
O u r toy f r a g m e n t is beginning to look s o m e w h a t m o r e respectable: a single n o d e for abstract verbs, a n o d e for each abstract verb lexeme, a n d t h e n i n d i v i d u a l n o d e s for each m o r p h o l o g i c a l f o r m of each verb; b u t there is still m o r e that can be done. O u r focus on a single lexeme has m e a n t that one class of r e d u n d a n c y has r e m a i n e d hidden. The line
< m o r f o r m > = = < m o r r o o t > i n g
will occur in e v e r y present participle f o r m of e v e r y verb, yet it is a c o m p l e t e l y generic statement & a t can be applied to all English p r e s e n t participle verb forms. Can w e not replace it with a single statement in the VERB n o d e ? Using the m e c h a n i s m s w e h a v e seen so far, the a n s w e r is no. The statement w o u l d have to be (i), w h i c h is equivalent to (ii), w h e r e a s the effect w e w a n t is (iii):
(i) V E R B : < m o r f o r m >
( i i ) V E R B : < m o r f o r m >
( i i i ) V E R B : < m o r f o r m >
= = < m o r r o o t > i n g
= = V E R B : < m o r r o o t > i n g
= = W o r d l : < m o r r o o t > i n g
Computational Linguistics Volume 22, Number 2
The p r o b l e m is that the inheritance m e c h a n i s m w e h a v e b e e n using is local, in the sense that it can only be u s e d to inherit either from a specifically n a m e d n o d e ( a n d / o r path), or relative to the local context of the n o d e ( a n d / o r path) at w h i c h it is defined. W h a t w e n e e d is a w a y of specifying inheritance relative to the the
original
n o d e / p a t h specification w h o s e value w e are trying to d e t e r m i n e , rather than the one w e h a v e reached b y following inheritance links. We shall refer to this original specification as the q u e r y w e are a t t e m p t i n g to evaluate, a n d the n o d e a n d p a t h associated w i t h this q u e r y as the g l o b a l c o n t e x t . 7 Global inheritance, that is, inheritance relative to the global context, is indicated in DATR b y using q u o t e d ( " . . . " ) descriptors, a n d w e can use it to e x t e n d o u r definition of VERB as follows:VERB:
<syn cat> = = verb
<syn type> = = main
< m o r f o r m > == " < m o r r o o t > " ing.
H e r e w e h a v e a d d e d a definition for <mor f o r m > that contains the q u o t e d p a t h "<mor r o o t > " . R o u g h l y speaking, this is to be i n t e r p r e t e d as "inherit the v a l u e of <mor r o o t > from the n o d e originally queried." With this extra definition, w e no longer n e e d a < m o r f o r m > definition in Wordl, SO it becomes:
W o r d l :
<> == L o v e
< s y n f o r m > == p r e s e n t p a r t i c i p l e .
To see h o w this global inheritance works, consider the q u e r y Wordl : <mor f o r m > . Since <mor f o r m > is not d e f i n e d at Wordl, it will inherit f r o m VERB via Love. This specifies inheritance of <mot r o o t > from the q u e r y n o d e , w h i c h in this case is Wordl. The p a t h <mor r o o t > is not defined at Wordl b u t inherits the v a l u e l o v e f r o m Love. Finally, the definition of <mor f o r m > at VERB a d d s an explicit ing, resulting in a v a l u e of l o v e i n g for W o r d l : < m o r f o r m > . H a d w e b e g u n e v a l u a t i o n at, say, a d a u g h t e r of the lexeme Eat, w e w o u l d h a v e b e e n directed from VERB : <mor f o r m > back to the original d a u g h t e r of Eat to d e t e r m i n e its <mor r o o t > , w h i c h w o u l d be inherited f r o m Eat itself; w e w o u l d h a v e e n d e d u p w i t h the v a l u e e a t ing.
The analysis is n o w almost the w a y w e w o u l d like it to be. Unfortunately, b y m o v i n g <mor f o r m > from Wordl to VERB, w e h a v e i n t r o d u c e d a n e w problem: w e h a v e specified the p r e s e n t participle as the (default) v a l u e of <mor f o r m > for all verbs. Clearly, if w e w a n t to specify other forms at the same level of generality, t h e n <mor f o r m > is currently m i s n a m e d : it s h o u l d be <mor p r e s e n t p a r t i c i p l e > , so that w e can a d d < m o r past p a r t i c i p l e > , < m o r present tense>, etc. If w e m a k e
this change, then the VERB n o d e will look like this:
VERB:
<syn cat> == verb
<syn type> = = main
<mor past> == "<mor root>" ed
<mor passive> == "<mot past>"
<mor p r e s e n t > == "<mor root>"
<mor present p a r t i c i p l e > == "<mor root>" ing
<mor present tense sing three> == "<mor root>" s.
In adding these new specifications, we have added a little extra structure as well. The passive form is asserted to be the same as the past form--the use of global inheritance here ensures that irregular or subregular past forms result in irregular or subregular passive forms, as we shall see shortly. The paths introduced for the present forms illustrate another use of default definitions. We assume that the morphology of present tense forms is specified with paths of five attributes, the fourth specifying number, the fifth, person. Here we define default present morphology to be simply the root, and this generalizes to all the longer forms, except the present participle and the third person singular.
For Love, given these changes, the following extensional statements hold, inter alia:
Love:
<syn
<syn
<mor
<mor
<mor
<mor
<mor
<mor
<mor
<mor
<mor
<mor
<mor
cat> = verb
type> = m a i n
present tense sing one> = love
present tense sing two> = love
present tense sing three> = love s
present tense plur> = love
present p a r t i c i p l e > = love ing
past tense sing one> = love ed
past tense sing two> = love ed
past tense sing three> = love ed
past tense plur> = love ed
past p a r t i c i p l e > = love ed
passive p a r t i c i p l e > = love ed.
There remains one last problem in the definitions of Wordl and Word2. The mor- phological form of Word1 is n o w given by <mor p r e s e n t p a r t i c i p l e > . Similarly, Word2's morphological form is given b y <mor p a s s i v e p a r t i c i p l e > . There is no longer a
unique
path representing morphological form. This can be corrected by the addition of a single statement to VERB:VERB :
<mor form> == "<mor "<syn form>">".
This statement employs a DATR construct, the evaluable path, which we have not encountered before. The right-hand side consists of a (global) path specification, one of whose component attributes is itself a descriptor that must be evaluated before the outer path can be. The effect of the above statement is to say that <mor form> globally inherits from the path given by the atom mor followed by the global value of
< s y n form>. For Wordl, < s y n f o r m > is present participle, so < m o r f o r m > inher-
its f r o m < m o r present p a r t i c i p l e > . But for Word2, < m o r f o r m > inherits f r o m < m o r
p a s s i v e p a r t i c i p l e > . Effectively, < s y n f o r m > is being used here as a parameter
S e w :
< m o r r o o t > = = m o w .
< > = = E N V E R B
< m o r r o o t > = = s e w .
As n o t e d above, the passive f o r m s of these subregular v e r b s will be correct n o w as well, because of the use of a global cross-reference to the past participle f o r m in the
VERB node. For example, the definition of the passive f o r m of s e w is:
W o r d 3 :
< > = = S e w
< s y n f o r m > = = p a s s i v e p a r t i c i p l e .
If w e seek to establish the < m o r f o r m > of Word3, w e are sent u p the hierarchy of nodes, first to Sew, then to EN_VERB, a n d then to VERB. H e r e w e e n c o u n t e r "<mot " < s y n f o r m > " > " , w h i c h resolves to "<mor p a s s i v e p a r t i c i p l e > " in virtue of the e m b e d - d e d global reference to < s y n f o r m > at Word3. This m e a n s w e n o w h a v e to establish the v a l u e of <mor p a s s i v e p a r t i c i p l e > at Word3. Again, w e ascend the h i e r a r c h y to VERB a n d find ourselves referred to the global descriptor "<mor p a s t p a r t i c i p l e > " This takes us back to Word3, from w h e r e w e again climb, first to Sew, t h e n to EN_VERB. Here, <mor p a s t p a r t i c i p l e > is g i v e n as the sequence "<mor r o o t > " en. This leads us to look for the <mor r o o t > of Word3, w h i c h w e find at Sew, giving the result w e seek:
W o r d 3 :
< m o r f o r m > = s e w en.
Irregularity can be treated as just the limiting case of subregularity, so, for example, the m o r p h o l o g y of Do can be specified as follows: 1°
D o :
< > = = V E R B
< m o r r o o t > = = d o
< m o r p a s t > = = d i d
< m o r p a s t p a r t i c i p l e > = = d o n e
< m o r p r e s e n t t e n s e s i n g t h r e e > = = d o e s .
Likewise, the m o r p h o l o g y of Be can be specified as follows: n
B e :
< > = = E N _ V E R B
< m o r r o o t > = = b e
10 Orthographically, the form does could simply be treated as regular (from do s). However, we have chosen to stipulate it here since, although the spelling appears regular, the phonology is not, so in a lexicon that defined phonological forms it would need to be stipulated.
Computational Linguistics Volume 22, Number 2
< m o r p r e s e n t t e n s e s i n g o n e > == am < m o r p r e s e n t t e n s e s i n g t h r e e > == is < m o r p r e s e n t t e n s e p l u r > == are
< m o r p a s t t e n s e s i n g o n e > == < m o r p a s t t e n s e s i n g t h r e e > < m o r p a s t t e n s e s i n g t h r e e > == w a s
< m o r p a s t t e n s e p l u r > == were.
In this section, we have moved from simple attribute/value listings to a compact, generalization-capturing representation for a fragment of English verbal morphology. In so doing, we have seen examples of most of the important ingredients of DATR: local and global descriptors, definition by default, and evaluable paths.
3. The DATR Language
3.1 Syntax
A DATR description consists of a sequence of sentences corresponding semantically to a set of statements. Sentences are built up out of a small set of basic expression types, which themselves are built up out of sequences of lexical tokens, which we take to be primitive.
In the previous section, we referred to individual lines in DATR definitions as state-
ments. Syntactically, however, a DATR description consists of a sequence of sentences,
where each sentence starts with a node name and ends with a period, and contains one or more path equations relating to that node, each corresponding to a statement in DATR. This distinction between sentences and statements is primarily for notational convenience (it would be cumbersome to require repetition of the node name for each statement) and statements are the primary unit of specification in DATR. For the pur- poses of this section, where we need to be particularly clear about this distinction, we shall call a sentence containing just a single statement a simple sentence.
3.1.1 Lexical Tokens. The syntax of DATR distinguishes four classes of lexical token: nodes, atoms, variables, and reserved symbols. The complete list of reserved symbols is as follows:
• , , < > _ _ _ ' y , #
We have already seen the use of the first seven of these. Single quotes can be used to form atoms that would otherwise be ill-formed as such; Y, is used for end-of-line comments, following the Prolog convention; # is used to introduce declarations and other compiler directives. 12
The other classes, nodes, atoms, and variables, must be distinct, and distinct from the reserved symbols, but are otherwise arbitrary. 13 For this discussion, we have al- ready adopted the convention that both nodes and atoms are simple words, with nodes starting with uppercase letters. We extend this convention to variables, discussed more fully in Section 3.4 below, which we require to start with the character $. And we take white space (spaces, new lines, tabs, etc.) to delimit lexical tokens but otherwise to be insignificant.
3.1.2
Right-hand-side Expressions.
The expressions that may appear as the right-handsides of DATR equations are sequences of zero or more
descriptors. 14
Descriptors aredefined recursively, and come in seven kinds. The simplest descriptor is just an atom or variable:
atoml
$varl
Then there are three kinds of
local inheritance descriptor:
a node, an (evalu-able) path, and a n o d e / p a t h pair. Nodes are primitive tokens, paths are
descriptor
sequences
(defined below) enclosed in angle brackets and n o d e / p a t h pairs consist of a node and a path, separated by a colon:Nodel
<descl desc2 desc3 ...>
N o d e l : < d e s c l desc2 desc3 ...>
Finally there are three kinds of global
inheritance descriptor,
which are quotedvariants of the three local types just described:
"Nodel"
"<descl desc2 desc3 ...>" " N o d e l : < d e s c l desc2 d e s c 3 ...>"
A descriptor sequence is a (possibly empty) sequence of descriptors. The recur- sive definition of evaluable paths in terms of descriptor sequences allows arbitrarily complex expressions to be constructed, such as: 15
"Nodel : < " < a t o m l > " N o d e 2 : <atom2>>"
" < " < " < N o d e l : < a t o m l atom2> atom3>" N o d e 2 "<atom4 atom5>" <> >">"
But the value sequences determined by such definitions are fiat: they have no struc- ture beyond the simple sequence and in particular do not reflect the structure of the descriptors that define them.
We shall sometimes refer to descriptor sequences containing only atoms as
simple
values,
and similarly, (unquoted) path expressions containing only atoms assimple
paths.
3.1.3 Sentences. DATR sentences represent the statements that make up a description. As we have already seen, there are two basic statement types, extensional and defini- tional, and these correspond directly to simple extensional and definitional sentences, which are made up from the components introduced in the preceding section.
14 DATR makes a distinction b e t w e e n a p a t h not h a v i n g a value (i.e., being undefined) a n d a p a t h h a v i n g the e m p t y sequence as a value:
NUM:
<two> =:
< o n e > = = o n e .
In this example, NUM: < o n e > has the value one, RUM: < t w o > has the e m p t y sequence as its value, a n d IoJg: < t h r e e > is simply undefined.
Computational Linguistics Volume 22, Number 2
S i m p l e extensional sentences take the f o r m
N o d e : P a t h = Ext.
w h e r e Node is a n o d e , P a t h is a s i m p l e path, a n d Ext is a s i m p l e value. Extensional sentences d e r i v a b l e f r o m the e x a m p l e s g i v e n in Section 2 include:
D o : < m o r p a s t p a r t i c i p l e > = d o n e .
M o w : < m o r p a s t t e n s e s i n g o n e > = m o w ed. L o v e : < m o r p r e s e n t t e n s e s i n g t h r e e > = l o v e s.
Simple definitional sentences take the f o r m
N o d e : P a t h = = Def.
w h e r e Node a n d P a t h are as a b o v e a n d Def is a n a r b i t r a r y d e s c r i p t o r sequence. Deft- nitional sentences a l r e a d y seen in Section 2 include:
D o : < m o r p a s t > = = did.
V E R B : < m o r f o r m > = = " < m o r " < s y n f o r m > " > " .
E N _ V E R B : < m o r p a s t p a r t i c i p l e > = = " < m o r r o o t > " en.
Each of these sentences c o r r e s p o n d s directly to a DATR statement. H o w e v e r w e e x t e n d the n o t i o n of a sentence to include a n a b b r e v i a t o r y c o n v e n t i o n for sets of s t a t e m e n t s relating to a single node. The following single sentence:
Node:
Path1 == D e f l P a t h 2 == Def2
PathN == DefN.
a b b r e v i a t e s (and is entirely e q u i v a l e n t to):
N o d e : P a t h 1 == D e f l . N o d e : P a t h 2 == Def2.
. . °
Node:PathN == DefN.
Extensional s t a t e m e n t s , a n d c o m b i n a t i o n s of definitional a n d extensional s t a t e m e n t s , m a y be similarly a b b r e v i a t e d , a n d the e x a m p l e s u s e d t h r o u g h o u t this p a p e r m a k e extensive u s e of this convention. Such c o m p o u n d sentences c o r r e s p o n d to a n u m b e r of i n d i v i d u a l (and entirely i n d e p e n d e n t ) DATR statements.
Finally, it is w o r t h reiterating that DATR d e s c r i p t i o n s c o r r e s p o n d to sets of state- ments: the o r d e r of sentences, or of definitions w i t h i n a c o m p o u n d sentence, is i m m a - terial to the relationships described.
3.2 Inheritance in DATR
specified (stated or inherited) via the default mechanism. We have already seen how values are explicitly stated; in this and the following subsections, we continue our exposition by providing an informal account of the semantics of specification via in- heritance or by default. The present subsection is only concerned with explicit (i.e., nondefault) inheritance. Section 3.3 deals with implicit specification via DATR's default mechanism.
3.2.1 Local Inheritance. The simplest type of inheritance in DATR is the specification of a value by local inheritance. Such specifications may provide a new node, a new path, or a new node and path to inherit from. An example definition for the lexeme Come illustrates all three of these types:
C o m e :
< > = = V E R B
< m o r r o o t > = = c o m e
< m o r p a s t > = = c a m e
< m o r p a s t p a r t i c i p l e > = = < m o r r o o t >
< s y n > = = I N T R A N S I T I V E : < > .
Here the empty path inherits from VERB so the value of Come:<> is equated to that of VERB: <>. And the past participle inherits from the root: Come:<mor p a s t p a r t i c i p l e > is equated to Come : <mor r o o t ) (i.e., come). In both these inheritances, only one node or path was specified: the other was taken to be the same as that found on the left-hand side of the statement (<> and Come respectively). The third type of local inheritance is illustrated by the final statement, in which both node and path are specified: the syntax of Come is equated with the empty path at INTRANSITIVE, an abstract node defining the syntax of intransitive verbs} 6
There is a natural procedural interpretation of this kind of inheritance, in which the value associated with the definitional expression is determined by "following" the inheritance specification and looking for the value at the new site. So given a DATR description (i.e., a set of definitional statements) and an initial n o d e / p a t h query, we look for the node and path as the left-hand side of a definitional statement. If the definitional statement for this pair provides a local descriptor, then we follow it, by changing one or both of node or path, and then repeat the process with the resulting n o d e / p a t h pair. We continue until some n o d e / p a t h pair specifies an explicit value. In the case of multiple expressions on the right-hand side of a statement, we pursue each of them entirely independently of the others. This operation is local in the sense that each step is carried out without reference to any context wider than the immediate definitional statement at hand.
Declaratively speaking, local descriptors simply express equality constraints be- tween definitional values for n o d e / p a t h pairs. The statement:
N o d e 1 : P a t h 1 = = N o d e 2 : P a t h 2 .
16 Bear i n m i n d t h a t t h e f o l l o w i n g are not s y n o n y m o u s
C o m e : < s y n > = = I N T R A N S I T I V E : < > . C o m e : < s y n > = = I N T R A N S I T I V E .
since the latter is equivalent to
Computational Linguistics Volume 22, Number 2
can be read approximately as "if the value for Node2 : Path2 is defined, then the value of Nodel:Pathl is defined and equal to it." There are several points to notice here.
First, if Node2:Path2 is not defined, then Nodel:Pathl is unconstrained, so this is
a weak directional equality constraint. However, in practice this has no useful con- sequences, due to interactions with the default mechanism (see Section 5.1 below). Second, "defined" here means "defined by a definitional statement," that is, a "==" statement: local inheritance operates entirely with definitional statements, implicitly introducing new ones for Nodel :Path1 on the basis of those defined for Node2:Path2. Finally, as we shall discuss more fully in the next subsection, "value" here technically
covers both simple values and global inheritance descriptors.
3.2.2 Global Inheritance. Like local inheritance, global inheritance comes in three types: node, path, and n o d e / p a t h pair. However, when either the node or the path is omitted from a global inheritance descriptor, rather than using the node or path of
the left-hand side of the statement that contains it (the local context of the definition),
the values of a global context are used instead. This behavior is perhaps also more easily introduced procedurally rather than declaratively. As we saw above, we can think of local inheritance in terms of following descriptors starting from the query. The local context is initially set to the node and path specified in the query. When a local descriptor is encountered, any missing node or path components are filled in from the local context, and then control passes to the new context created (that is, we look at the definition associated with the new n o d e / p a t h pair). In doing this, the local context also changes to be the new context. Global inheritance operates in exactly the same way: the global context is initially set to the node and path specified in the query.
It is not altered when local inheritance descriptors are followed (it "remembers" where
we started from), but when a global descriptor is encountered, it is the global context that is used to fill in any missing node or path components in the descriptor, and
hence to decide where to pass control to. In addition, both global and local contexts
are updated to the new settings. So global inheritance can be seen as essentially the same mechanism as local inheritance, but layered on top of it--following global links alters the local context too, but not vice versa.
For example, when a global path is specified, it effectively returns control to the current global node (often the original query node) but with the newly given path. Thus in Section 2, above, we saw that the node VERB defines the default morphology of present forms using global inheritance from the path for the morphological root:
V E R B : < m o r p r e s e n t > == " < m o r r o o t > " .
The node from which inheritance occurs is that stored in the global context: a query of
L o v e : < m o r p r e s e n t > will result in inheritance f r o m L o v e : < m o r r o o t > (via VERB: < m o r p r e s e n t > ) , while a q u e r y of D o : < m o r p r e s e n t > will inherit f r o m D o : < m o r r o o t > .
Similarly, a quoted node form accesses the globally stored path value, as in the following example:
D e c l e n s i o n l :
< v o c a t i v e > == -a < a c c u s a t i v e > == -am. D e c l e n s i o n 2 :
< v o c a t i v e > == " D e c l e n s i o n l " < a c c u s a t i v e > == -um.
< v o c a t i v e > == -e
< a c c u s a t i v e > == D e c l e n s i o n 2 : < v o c a t i v e > .
Here, the value of D e c l e n s i o n 3 : < a c c u s a t i v e > inherits f r o m D e c l e n s i o n 2 : < v o c a - t i v e > a n d then f r o m D e c l e n s i o n l : < a c c u s a t i v e > , using the global p a t h (in this case
the query path), rather than the local path
(<vocative>)
to fill out the specification.The resulting value is -am and not - a as it would have been if the descriptor in Declension2 had been local, rather than global.
We observed above that when inheritance through a global descriptor occurs, the global context is altered to reflect the new n o d e / p a t h pair. Thus after Love:<mor p r e s e n t > has inherited through "VERB: <mor r o o t > " , the global path will be <mor r o o t > rather than <mor p r e s e n t > . When we consider quoted n o d e / p a t h pairs, it turns out that this is the only property that makes them useful. Since a quoted n o d e / p a t h pair completely respecifies both node and path, its immediate inheritance characteristics are the same as the unquoted n o d e / p a t h pair. However, because it
also alters the global context, its effect on any
subsequent
global descriptors (in theevaluation of the same query) will be different:
D e c l e n s i o n 1 :
< v o c a t i v e > == " < n o m i n a t i v e > " < n o m i n a t i v e > == -a.
D e c l e n s i o n 2 :
< v o c a t i v e > == D e c l e n s i o n l < n o m i n a t i v e > == -u. D e c l e n s i o n 3 :
< n o m i n a t i v e > == -i
< a c c u s a t i v e > == " D e c l e n s i o n 2 : < v o c a t i v e > " .
In this example, the value of Declension3 : < a c c u s a t i v e > inherits from Declension2 : < v o c a t i v e > and then from Declension1 : < v o c a t i v e > and then from Declension2 : < n o m i n a t i v e > (because the global node has changed from Declension3 to Declen- sion2) giving a value of -u and not - i as it would have been if the descriptor in Declension3 had been local, rather than global.
There are a number of ways of understanding this global inheritance mechanism. The description we have given above amounts to a "global memory" model, in which a DATR query evaluator is a machine equipped with two memories: one containing the current local node and path, and another containing the current global node and path. Both are initialized to the query node and path, and the machine operates by repeatedly examining the definition associated with the current local settings. Local descriptors alter just the local memory, while global descriptors alter both the local and global settings.
Computational Linguistics Volume 22, Number 2
of) value. Consequently, global descriptors are also distributed t h r o u g h the local inher- itance network, a n d so are implicitly present at m a n y n o d e / p a t h pairs in addition to those they are explicitly d e f i n e d for. In fact, a global descriptor is implicitly p r e s e n t at e v e r y n o d e / p a t h pair that c o u l d ever occur as the global context for e v a l u a t i o n of the descriptor at its original, explicitly d e f i n e d location. This m e a n s that once distributed in this way, the global descriptors f o r m a n e t w o r k of w e a k equality relationships just as the local descriptors do, a n d distribute the simple values (alone) in the same way. To see this interpretation in action, w e consider an alternative analysis of the past participle form of Come. The essential elements of the analysis are as follows:
B A R E _ V E R B :
< m o r p a s t p a r t i c i p l e > == " < m o r r o o t > " . Come:
< m o r r o o t > == c o m e
< m o r p a s t p a r t i c i p l e > == B A R E _ V E R B .
Local inheritance from BARE_VERB to Come implicitly defines the following s t a t e m e n t (in addition to the above):
Come:
< m o r p a s t p a r t i c i p l e > == " < m o r r o o t > " .
Because w e h a v e n o w b r o u g h t the global inheritance d e s c r i p t o r to the n o d e corre- s p o n d i n g to the global context for its interpretation, global inheritance can n o w oper- ate entirely l o c a l l y - - t h e r e q u i r e d global n o d e is the local n o d e , Come, p r o d u c i n g the desired result:
Come:
< m o r p a s t p a r t i c i p l e > = come.
Notice that, in this last example, the final s t a t e m e n t was extensional, not def- initional. So far in this p a p e r w e h a v e almost entirely i g n o r e d the distinction w e established b e t w e e n definitional a n d extensional statements, b u t w i t h this declarative reading of global inheritance w e can d o so no longer. Local inheritance uses definitional inheritance statements to distribute simple values a n d global descriptors. The simple- v a l u e d definitional statements t h e r e b y d e f i n e d m a p directly to extensional statements, a n d global inheritance uses the global inheritance statements (now distributed), to fur- ther distribute these extensional statements a b o u t simple values. The statements m u s t be of a formally distinct type, to p r e v e n t local inheritance descriptors f r o m distribut- ing t h e m still further. In practice, h o w e v e r , w e n e e d not be too c o n c e r n e d a b o u t the distinction: descriptions are w r i t t e n as definitional statements, queries are read off as extensional s t a t e m e n t s J 7
The declarative interpretation of global inheritance suggests an alternative proce- dural characterization to the one already discussed, w h i c h w e outline here. Starting from a query, local descriptors alone are used to d e t e r m i n e either a v a l u e or a global descriptor associated w i t h the q u e r i e d n o d e / p a t h pair. If the result is a global descrip- tor, this is used to construct a n e w query, w h i c h is e v a l u a t e d in the same way. The
17 However, in principle, there is nothing to stop an extensional statement from being specified as part of a DATR description directly. Such a statement would respect global inheritance but not local
process repeats until a value is returned. The difference between this and the earlier model is one of perspective: When a global descriptor is encountered, one can either bring the global context to the current evaluation context (first model), or take the new descriptor back to the global context and continue from there (second model). The significance of the latter approach is that it reduces both kinds of inheritance to a single basic operation with a straightforward declarative interpretation. Thus we see that DATR contains two instances of essentially the same declarative inheritance mechanism. The first, local inheritance, is always specified explicitly, while the second, global inheritance, is specified implicitly in terms of the first.
Extending these inheritance mechanisms to the more complex DATR expressions is straightforward. Descriptors nested within definitional expressions are treated indepen- d e n t l y - a s though each was the entire value definition rather than just an item in a sequence. In particular, global descriptors that alter the global context in one nested definition have no effect on any others. Each descriptor in a definitional sequence or
evaluable path is evaluated from the same global state. In the case of global evaluable
paths, once the subexpressions have been evaluated, the expression containing the resultant path is also evaluated from the same global state.
3.3 Definition by Default
The other major component of DATR is definition by default. This mechanism allows a DATR definitional statement to be applicable not only for the path specified in its left-hand side, but also for any rightward extension of that path for which no more specific definitional statement exists. In effect, this "fills in the gaps" between paths defined at a node, on the basis that an undefined path takes its definition from the
path that best approximates it without being more specific. TM Of course, to be effective,
this "filling in" has to take place before the operation of the inheritance mechanisms
described in the previous section.
Consider for example, the definition of Do we gave above.
D o :
< > = = V E R B < m o r r o o t > = = d o < m o r p a s t > = = d i d
< m o r p a s t p a r t i c i p l e > = = d o n e
< m o r p r e s e n t t e n s e s i n g t h r e e > = = d o e s .
Filling in the gaps between these definitions, we can see that many paths will be implicitly defined only by the empty path specification. Examples include:
D o :
< m o r > = = V E R B < s y n > = = V E R B
< m o r p r e s e n t > = = V E R B < s y n c a t > = = V E R B < s y n t y p e > = = V E R B
< m o r p r e s e n t t e n s e s i n g o n e > = = V E R B .
If there had been no definition for < >, then none of these example paths would have
Computational Linguistics Volume 22, Number 2
b e e n defined at all, since there w o u l d h a v e b e e n no leading s u b p a t h with a definition. N o t e h o w <mor> itself takes its definition from < > , since all the explicitly d e f i n e d <mor . . . > specifications h a v e at least one further attribute.
The definition for <mor p a s t > overrides default definition f r o m < > a n d in t u r n p r o v i d e s a definition for longer paths. H o w e v e r , <mor p a s t p a r t i c i p l e > blocks de- fault definition from <mor p a s t > . Thus the following arise: 19
Do :
<mor past
<mor past
<mor past
<mor past
<mor past
tense> == did
tense plur> == did
tense sing three> == did
participle plur> == done
participle sing one> == done.
Similarly all the <mor p r e s e n t > f o r m s inherit f r o m VERB except for the explicitly cited < m o r present tense sing three>.
Definition b y default introduces n e w D A T R sentences, each of w h o s e left-hand-
side paths is an extension of the left-hand-side paths of s o m e explicit sentence. This
path extension carries over to a n y paths occurring o n the
right-hand side
as well. Forexample, the sentence:
VERB:
<mor present tense> == "<mor root>"
<mor form> == <mor "<syn form>">.
gives rise to the following, inter alia:
VERB:
<mor present tense sing> == "<mor root sing>"
<mor present tense plur> == "<mor root plur>"
<mor form present> == <mor "<syn form present>" present>
<mor form passive> == <mor "<syn form passive>" passive>.
This extension occurs for all p a t h s in the r i g h t - h a n d side, w h e t h e r they are q u o t e d or u n q u o t e d a n d / o r n e s t e d in descriptor sequences or evaluable paths.
The intent of this p a t h extension is to allow descriptors to p r o v i d e not s i m p l y a single definition for a p a t h b u t a w h o l e set of definitions for extensions to that path, w i t h o u t losing p a t h information. In some cases this can lead to g r a t u i t o u s extensions to p a t h s - - p a t h attributes specifying detail b e y o n d a n y of the specifications in the overall description. H o w e v e r , this does n o t generally cause p r o b l e m s since such g r a t u i t o u s l y detailed paths, being unspecified, will always take their v a l u e f r o m the m o s t specific p a t h that is specified; effectively, gratuitous detail is ignored. 2° Indeed, DATR's ap- p r o a c h to default i n f o r m a t i o n always implies an infinite n u m b e r of u n w r i t t e n DATR statements, w i t h p a t h s of arbitrary length.
19 The past participle extensions here are purely for the sake of the formal example---they have no role to play in the morphological description of English (but cf. French, where past participles inflect for gender and number).
3.4 Abbreviatory Variables
The default m e c h a n i s m of DATR p r o v i d e s for g e n e r a l i z a t i o n across sets of a t o m s b y m e a n s of p a t h extension, a n d is the p r e f e r r e d m e c h a n i s m to u s e in the m a j o r i t y of cases. H o w e v e r , to t r a n s d u c e a t o m s in the p a t h d o m a i n to a t o m s in the v a l u e d o m a i n (see Section 4.3, below), it is e x t r e m e l y c o n v e n i e n t to u s e a b b r e v i a t o r y v a r i a b l e s o v e r finite sets of atoms. This is a c h i e v e d b y declaring DATR v a r i a b l e s w h o s e use constitutes a k i n d of macro: t h e y can a l w a y s be e l i m i n a t e d b y replacing the e q u a t i o n s in w h i c h t h e y occur w i t h larger sets of e q u a t i o n s that spell o u t each v a l u e of the variables. Conventionally, v a r i a b l e n a m e s b e g i n w i t h the S character a n d are declared in one of the following three ways:
# v a t s $ V a r l : # v a r s $ V a r 2 : # v a r s $ V a r 3 .
Range 1 Range2 . . . .
Rangel Range2 . . . - RangeA RangeB . . . .
Here, the first case declares a v a r i a b l e $Varl that r a n g e s o v e r the v a l u e s R a n g e l , Range2 . . . . w h e r e e a c h RangeN is either an a t o m or a variable n a m e ; the s e c o n d case declares $Var2 to r a n g e o v e r the s a m e range, b u t
excluding
v a l u e s in RangeA RangeB . . . ; a n d the third declares SVar3 to r a n g e o v e r the full (finite) set of a t o m s in the language. 21 For e x a m p l e :# v a r s # v a r s # v a t s # v a r s
Sletters: a b c d e f g h i j k 1 m n o p q r s t u v w x y z. $ v o w e l s : a e i o u.
$ c o n s o n a n t s : $ 1 e t t e r s - $ v o w e l s . $ n e t z: S l e t t e r s - z.
# v a r s # v a r s # v a r s
$odd: 1 3 5 7 9. Seven: 0 2 6 4 8. S d i g i t : Sodd Seven.
C a u t i o n h a s to be exercised in the use of DATR v a r i a b l e s for t w o reasons. O n e is that their use m a k e s it h a r d to s p o t m u l t i p l e conflicting definitions:
# v a r s $ v o w e l : a e i o u. D I P T H O N G :
< e > = = e i < >
< $ v o w e l > = = $ v o w e l e <>.
Here, < e > a p p e a r s on the left h a n d side of t w o conflicting definitions. Exactly w h a t h a p p e n s to s u c h a n i m p r o p e r d e s c r i p t i o n in practice d e p e n d s on the i m p l e m e n t a t i o n , a n d u s a g e s of this k i n d c a n b e the source of hard-to-locate b u g s (see also Section 5.1, below).
The other r e a s o n is that one can fall into the t r a p of u s i n g v a r i a b l e s to e x p r e s s generalizations that w o u l d be better e x p r e s s e d u s i n g the p a t h extension m e c h a n i s m . H e r e is a v e r y b l a t a n t example:
# v a r s S n u m b e r : s i n g u l a r p l u r a l .
extensional statements:
Cat:
<plural> = -s. Datum:
<plural> = -a. Alumnus:
<plural> = -i.
We do n o t n e e d to i n v o k e a n attribute called c a s e to get this technique to w o r k . For e x a m p l e , in Section 2, w e g a v e the following definition of <mor f o r m > in t e r m s of < s y n form>:
V E R B :
<mor form> == <mor "<syn form>">.
Here the feature < s y n f o r m > returns a value (such as passive participle or present tense sing three) that b e c o m e s part of the path through w h i c h < m o r f o r m > inherits. This m e a n s that nodes for surface w o r d forms need only state their parent lexeme a n d < s y n f o r m > feature in order for their < m o r f o r m > to be fully described, n So, as w e
s a w in Section 2 above, the passive participle form of s e w is fully described b y the
n o d e definition for Word3. Word3 :
<> == Sew
<syn form> == passive participle.
For finite forms, w e could use a similar technique. F r o m this:
Word4 :
<> == Sew
<syn form> == present sing third.
w e w o u l d w a n t to be able to infer this:
Word4:
<mor form> = sew s
However, the c o m p o n e n t s of < s y n form>, present, sing, third are themselves values of features w e probably w a n t to represent independently. O n e w a y to achieve this is to define a value for < s y n f o r m > w h i c h is itself parameterized from the values of these other features. A n d the appropriate place to d o this is in the VERB node, thus:
V E R B :
<syn form> == "<syn tense>" "<syn number>" "<syn person>".
This says that the default v a l u e for the syntactic f o r m of a v e r b is a finite form, b u t exactly w h i c h finite f o r m d e p e n d s on the settings of three other paths, < s y n t e n s e > ,