Subtypes: Type versus Class versus Interface

Even the UML definition of generalization embeds the concept of subtype: "An instance of the more specific element may be used where the more general element is allowed." In a strongly typed system, the behavior of an object depends on its type, and you can't call operations or access attributes in contexts that require objects of a different

type. Subtyping lets you extend the concept of "the same type" to a hierarchy of related objects rather than just to the objects of a single class.

Computer scientists carefully distinguish the notions of type and subtype from the concept of inheritance. Programming language designers are less careful for the most part, though some OO languages do make the distinction. Languages such as Smalltalk have no typing at all, just inheritance. In Smalltalk, for example, you can send a message to any object that provides the method with the signature your message specifies.

The rationale for strong typing is that this kind of flexibility leads inevitably to defects, for at least three reasons. You can accidentally set up your system at runtime to call a method on an object that doesn't exist, yielding an exception (falling off the top of the inheritance hierarchy). You can call a method that does something entirely unrelated to what you intended. You can pass objects that differ enough in format to yield operating system or runtime system errors such as divide-by-zero or access violations.

C++ combines subtyping and inheritance into a single structure, the inheritance hierarchy. A subclass is a subtype in C++. You can use a C++ subclass object anywhere you can use one of its parents. This in turn provides the rationale for multiple inheritance. Since strong typing prevents you from using an object that doesn't inherit from the

designated superclass, you must use multiple inheritance to inherit that superclass. Otherwise, you have to turn off strong typing by using void pointers or something similar.

The interface provides an alternative to multiple inheritance through introduction of a completely different subtype relationship. In systems that support interfaces as types, you can use an object that supports an interface anywhere you can use the interface. Instead of declaring variables with the class type, you use interface types. This then permits you to refer to the interface operations on any object that provides those methods. It's still strong typing, but you get much of the flexibility of a weakly typed system such as Smalltalk without the complexity and strong coupling of multiple inheritance. The Java programming language supports this kind of typing. The downside to interfaces is that you can't really use them to inherit code, just the interface specification. You must implement all the operations in the interface with appropriate methods when you apply it to a class.

Interfaces support a special UML relationship: the "realizes" relationship. This is a kind of generalization relationship that represents not inheritance, but type realization. That is, when you associate an interface with a class, you are saying that the class realizes or implements the interface. The class can thus act as a subtype of the interface. You can instantiate objects of the class, then assign them to variables typed with the interface and use them through that interface. This is similar to what you do with superclasses, but you have access only to the interface, not to all the operations on the subclass.

Interface realization permits you to use overriding and late binding outside the class hierarchy, freeing you from the artificial constraints imposed by inheritance in a tree structure. For example, instead of instantiating the different Identification subclasses and putting them into a container typed with the Identification class, you can type the container as the Identification interface. This gives you the best of both worlds. You still have strong typing and type safety, but you can access an object through multiple mixed-in interfaces. Multiple inheritance gives you this ability as well but with a much greater degree of system coupling between classes, never a good idea.

Figure 7-9 shows the Identification hierarchy redone using interfaces. The diagram is simpler, and you have the additional flexibility of being able to use the Identification interface in other classes beyond the Identification

hierarchy. Consider again the fingerprint identification example from the earlier section on "Multiple Inheritance." Just adding the Identification interface to Fingerprint, then building the appropriate methods to realize the three interface operations, lets you extend FingerprintRecord and use it wherever you use an Identification interface.

Figure 7-9 also shows the two forms of the realization relationship. The dashed arrow with the white arrowhead is a formal realizes relationship, showing the connection of Identification to BirthCertificate. The other classes, to save space, show the realizes relationship as a small, labeled circle connected to the class. The label is the interface name.

Figure 7-9 is less complex, but you can easily see why interfaces are not the preferred way of doing every kind of subtyping. Although inheriting the GetExpireDate abstract operation is logically the same as inheriting the ExpireDate attribute, practically the latter makes more sense. This is especially true in a data model destined for implementation in a DBMS logical schema. As a rule, if you have meaningful attributes as common properties, you should use a generalization; if you have only operations, you should use interfaces. Identification meets this test (ignoring the object identifier), while ExpiringID does not. The FingerprintRecord class is the best example of good interface realization. If you can use the phrase "is used as," as in "a fingerprint record is used as identification," you are looking at interface realization. If you use the phrase "is a," as in "a passport is an expiring ID," that's generalization. Finally, Figure 7-9 also shows a feature of interfaces: interface inheritance. The ExpiringID interface inherits the Identification interface, and you can use an object with an ExpiringID interface anywhere you use an Identification interface.

There is a third kind of type-related concept in programming languages: genericity. Languages that support templates (C++) or generics (Ada, Eiffel) give you the ability to create multiple versions of an implementation of a class or function, each one parameterized with a series of types and/or objects. When you instantiate an object typed with the generic class, you supply appropriate types and objects as parameter arguments. All the operations and attributes take on the appropriate types, and you can use the object in a strongly typed system. Classic uses for generics are basic data structures such as stacks and trees and generic algorithms such as sorts. You parameterize the kind of objects the aggregate structures contain, for example.

Figure 7-9: The Identification Interface and Interface Realization

Data modeling can make good use of typing. Most database programming languages are strongly typed, though some of them are curious in that regard. For example, SQL does not regard a table definition as a type. When you create a named table, you are creating an aggregate object (Date's "relvar" or relational variable), not a type you can use to specify the type of an attribute somewhere. Cursors provide a generic way to access these objects. You can look at most current relational DBMS CASE tools for the specific consequence for data modeling. When you define the schema, you create an entity. When you forward-engineer that schema, that entity becomes a table in the database. If you want to define multiple tables with the same structure, you have to copy the existing entity or create a new one that looks identical. These things don't behave as types. The later section on "Domain Constraints" discusses some of these attribute type issues in more detail.

ORDBMS products extend the typing system with user-defined types (UDTs in the jargon of the vendors and the SQL3 standard). You define the type, then instantiate tables from it. You can access all the objects of a given type with specific variations on the SELECT statement, or you can access just those in a single table instance

[Stonebraker and Brown 1999]. This opens up a world of interesting storage options by freeing up the storage of the rows/objects from their type definition. In the RDBMS, you need to define the physical model for the entire set of objects of the entity. In the ORDBMS, you can store multiple tables of the same type in totally different locations.

ORACLE8's table partitioning scheme lets you automate the process without introducing instance logical location issues. The table is one table stored in multiple locations defined by the data, not multiple tables of the same type, though you can do that too.

Most OODBMS products adopted the C++ model (Gemstone being the early exception with its Smalltalk format). For better or worse, this has meant the integration of typing and inheritance in OODBMS schemas. One difference with the OODBMS products is that you create objects of the type and store them in logical (not physical) containers. Not one of these products, as far as I am aware, supports interfaces or generics directly. Interfaces are useful in the data model as a way of expressing the realization relationship, but the translation to the logical and physical models requires quite a bit of work. Generics or templates simply don't exist in the database world.

While generalization in all its variety is a powerful data modeling tool, it pales in the light of the other major type of relationship: the association.

In document Database Design For Smarties Using UML For Data Modeling Robert Muller pdf (Page 99-102)