Object Identity and Uniqueness Constraints

Any database data model must contain a way to identify each unique object in the database. Holmes's identification of the individual weapon used at Thor Bridge shows the value of individuality.

"No doubt she blamed this innocent lady for all those harsh dealings and unkind words with which her husband tried to repel her too demonstrative affection. Her first resolution was to end her own life. Her second was to do it in such a way as to involve her victim in a fate which was worse far than any sudden death could be.

"We can follow the various steps quite clearly, and they show a remarkable subtlety of mind. A note was extracted very cleverly from Miss Dunbar which would make it appear that she had chosen the scene of the crime. In her anxiety that it should be discovered she somewhat overdid it, by holding it in her hand to the last. This alone should have excited my suspicions earlier than it did.

"Then she took one of her husband's revolvers—there was, as you saw, an arsenal in the house—and kept it for her own use. A similar one she concealed that morning in Miss Dunbar's wardrobe after discharging one barrel, which she could easily do in the woods without attracting attention. She then went down to the bridge where she had contrived this exceedingly ingenious method for getting rid of her weapon. When Miss Dunbar appeared she used her last breath in pouring out her hatred, and then, when she was out of hearing, carried out her terrible purpose. Every link is now in its place and the chain is complete. The papers may ask why the mere was not dragged in the first instance, but it is easy to be wise after the event, and in any case the expanse of a reed-filled lake is no easy matter to drag unless you have a clear perception of what you are looking for and where." [THOR]

Most database systems provide unique, proprietary, and hidden identifiers for rows and objects (ROWIDs in Oracle, for example). 00 database managers provide uniqueness through the object identifier, or OID. Identifying an object through this kind of surrogate is implicit, or existence-based, identification [Blaha and Premerlani 1998, pp. 193— 1941]. The alternative is explicit, or value-based, identification. With explicit identification, you can identify the object with the values of one or more attributes of the object. Object identity corresponds to the ER modeling concept of primary key. I use the term "OID" to refer to any kind of key that uniquely identifies an object.

To recap the key definitions, this time in terms of objects: a candidate key is a set of implicit or explicit attributes that uniquely identify an object of a class. The set of attributes must be complete in that you should not be able to remove any attribute and still preserve the uniqueness of the objects. Also, no candidate key can be null, as this would logically invalidate the equality comparison you need to apply to ensure uniqueness. A primary key is a candidate key that you choose to become the identifier of the class, the OID. An alternate key is a candidate key that is not the primary key. The alternate key thus represents an implicit or explicit uniqueness constraint.

Note

In the UML, objects have identity but data values do not. Data values are values of primitive types, types such as int or VARCHAR that have no assumption of underlying object identity. Objects are instances of classes. See the section "Domain Constraints" for a complete discussion of data values and object values.

The UML does not provide any standard means of expressing object identity because it assumes that object identity is a basic, implicit, and automatic property of an object. To include the concept in UML data modeling, therefore, you must extend the UML with some additional properties for attributes. You define explicit keys in a UML class diagram with two extended tagged values, {OID} and {alternate OID}. Again, these are not standard UML notation; they are custom UML extensions for data modeling.

These two properties identify those attributes that serve to identify a unique instance of the class. By attaching the {OID} tagged value to an attribute, you specify that attribute as part of the primary key, or object identifier. By

attaching the {alternate OID=n} tagged value to an attribute, you specify it as part of alternate key n, where n is an integer.

The {OID} and {alternate OID} properties constrain the objects of the class regardless of storage. They correspond to the SQL PRIMARY KEY constraint and the UNIQUE constraint, respectively. If, for example, you create an object relational type and instantiate two tables from that type, the {OID} property on the attributes constrains all of the objects in both tables to have unique values with respect to each other.

Note

You only specify the {OID} property on the primary key attributes of the regular classes you build, not association classes. The OID of an association class is always implicit and consists of the combined oid attributes of the classes that participate in the relationship.

Figure 7-14 shows the oids for the Person and Identification classes. The Person class is a complete object in itself and has strong object identity. If you need to make identity explicit, you usually add an attribute with a suitable type to serve as the object identifier, in this case the PersonID: OID {OID} attribute.

In relational and object-relational systems, this corresponds to an identifier you generate, such as a sequence in Oracle or an IDENTITY in SQL Server. Be careful, though: if you create the OID on a table basis, its scope is the table, not

Figure 7-14: The Explicit {OID} Tagged Value

the class. That is, if you define multiple tables from a class, the OID will not uniquely identify the objects across the two tables, just within each table. In most OODBMS systems, objects have built-in oids that you can access and use for establishing identity. Usually referential integrity happens automatically because of the way you relate objects to one another, and the systems use the oids in representing the relationships. Codd described this problem and its requirements for "complete" RDBMS products. He specified that relational systems must behave in a similar way, treating a primary key as potentially shared between several tables [Codd 1990, pp. 25—26, 36—37]. Specifically, the attributes share a common, composite domain from which all the participating tables draw their values. Defining this in terms of domains lets you preserve strong typing under joins and set operations as well as in comparison expressions [Codd 1990, p. 49].

When you build a generalization hierarchy, you specify the {OID} property at the top of the hierarchy. The subclasses of the root class inherit the explicit object identifier from that class. You can always add {alternate OID} specifications lower in the generalization hierarchy. Multiple inheritance, of course, throws something of a monkey wrench into this scheme, just as it does for object structure. If a class inherits {OID} attributes from more than one parent, you must override the property in the inheriting class to specify one or the other attribute set as the {OID} of the multiply inheriting subclass. You can also specify a completely new {OID} property if that makes sense.

Tip

Making sense of multiple inheritance is hard. This is just another reason to avoid multiple inheritance in data modeling. One way or another, this is going to complicate your life. Try to avoid it. If you can't avoid it, make the OID implicit and let the system handle identity through object existence rather than through inheritance. If you won't do that, all I can say is that I feel your pain.

Similarly, a class related to another through composite aggregation (a weak relationship in ER terms) gets the OID from the aggregating class. This does not constitute the complete OID for the aggregated class, though; you need to specify any additional attributes required to identify the individual element within the aggregate. For example, in

Figure 7-14, the Identification class OID is the combination of the Person OID from the aggregating person and the identification number. This usage is a bit unusual in UML.

In 00 design and programming, and hence in the UML, every object has implicit identity. In transient systems, identity often is the address in memory of the object. In persistent OODBMS systems, each object has an OID that often contains physical or logical storage information. In relational and ORDBMS products, there is usually a similar way to identify each row. None of these identifiers is explicitly part of the conceptual schema or class design, it just happens automatically. This is most of the reason why there is no standard way to refer to oids in UML diagrams; other constructs such as relationships, classes, and objects imply the whole concept.

Which approach is better, explicit or implicit identity? From the purist 00 perspective, you should leave out the explicit OID attributes entirely. Figure 7-15 shows what Figure 7-14 would look like using this approach. Given this diagram, when you generate a relational database schema, you would add a column that uniquely identified each row in the Person table corresponding to a Person object. The aggregation relationship between the two classes implies that there is a foreign key in Identification pointing to a single person. The aggregation implies that the foreign key is also part of the primary key. Instead of generating a unique OID column for Identification, therefore, you generate two columns. First is the Person OID that refers back to the person table as a foreign key. You follow it with a unique number for each row with the same Person oid. You could specify the attribute to use as the second OID element by using a different tagged value, {aggregate OID}. This tags the attribute as the element to add to the aggregating object's OID to create the new object's oid. If you wanted to replace the implicit OID with an explicit set of attributes, you would just specify the {OID} tagged value on those attributes. There would be no implicit OID in the underlying conceptual schema.

Figure 7-15: The Implicit OID Approach

The second approach makes all this explicit at the design level, much as in Figure 7-14. The explicit approach has the benefit of keeping all the attributes clear and open in your design. This makes the connection between your data model and your conceptual schema in the target data model more direct. Since the objects are persistent, you will usually have to have methods on your classes that manipulate the oids on their way from persistent memory to transient memory. OODBMS products handle this automatically, but RDBMS and ORDBMS products usually don't. Making the OID explicit gives you the ability to design OID handling into your persistent classes. The implicit approach, on the other hand, hides the details and lets you do the work when you convert the data model to the conceptual schema in your target database. As long as you have standard OID handling built into your persistent class hierarchy in some way, this works fine. In relational databases, this requires a good deal of additional work. You can have an explicit OID spanning two or more attributes of a class, which corresponds to the ER composite key from Chapter 6. When you join sets of objects (tables or whatever) on the primary key, and that key has more than one attribute, the syntax for the join condition can get very messy, especially for outer joins. You need one

comparison expression for each attribute in the join. You can simplify coding a good deal by replacing multiple- attribute explicit oids (composite keys) with single-attribute explicit oids that you generate. This usage is somewhere between an explicit and an implicit oid. It's implicit because it depends on existence. It's explicit because it has to be an actual attribute of the class, since you're creating it for explicit use.

Yet another issue arises as your data model grows in size. For largish databases, you may find that consistency in your choice of identity approach yields productivity benefits. Consistency, in this case, means the implicit approach. With a large number of tables, programmers may get easily confused by a plethora of different OID attributes scattered around the database. With implicit OIDs, and with a standard OID naming convention for persistent attributes in the conceptual schema, programmers will usually be able to code without confusion about which attributes are identifying ones.

In document Database Design For Smarties Using UML For Data Modeling Robert Muller pdf (Page 107-110)