During the 1970s, there were several strands of software research that cross-pollinated each other with ideas about the semantic relationships between data structures. The term "semantic" means of or relating to the meaning of an object, as opposed to its structure (syntax) or its use (pragmatics). Chen's ER notation introduced part of this background data theory by integrating relationships and their constraints as abstractions that made a valuable modeling contribution. Later ER notations went even further, introducing the concept of subtyping [Teorey, Yang, and Fry 1986] and aggregation [Bruce 1992].
Tip
You can usually do a perfectly good database design using ER diagramming techniques without worrying much about subtyping and aggregation. Once you start moving in the direction of OO design with UML, however, you will find these issues much more important.
Subtyping is also known by several other names: generalization, inheritance, is-a, is-a-kind-of, or subclassing. Each comes from using the basic concept of generalization in different contexts. The generalization relationship specifies that a supertype generalizes the properties of several subtypes. In other words, the supertype contains the
properties that the subtypes share, and all subtypes inherit the properties of the supertype. Often the supertype is purely an abstract type, meaning that there are no instances of the supertype, just of the subtypes.
Note
Chapter 7 goes into much more detail on inheritance, which is a fundamental property of class modeling. It also discusses multiple inheritance (having multiple supertypes), which you never find in ER models, or at least I haven't observed it in the ER salt marshes I've visited.
Figure 6-7 illustrates a subtype relationship expressed in IDEF1X notation. An identification is an abstract type of object that identifies a person, and there are many different kinds of identification. The subtype hierarchy in Figure 6- 7 shows an additional abstract subtype, identifications that have expiration dates (passports, driver's licenses, and so on), as opposed to those that do not (birth certificates).
You can add mutual exclusivity as a constraint on this relationship as well [Teorey, Yang, and Fry 1986; Teorey 1999]. If the subtypes of an entity are mutually exclusive, it means that an object or instance of the entity must be precisely one of the subtypes. The identification hierarchy represents this kind of subtyping. You can't have an identification document that is simultaneously a passport and a driver's license, for example. If subclasses are not mutually exclusive, then the subtypes overlap in meaning. You could imagine a document that would be
simultaneously a passport, driver's license, and birth certificate, I suppose. A better example would be an
organization that is both a criminal organization and a legal company, a very common situation in the real world. The underlying reality of this way of looking at subtypes is as overlapping sets of objects. Mostly, you want to avoid this, as it makes the objects (rows in the database) interdependent and hence harder to maintain through changes. On the other hand, if the underlying reality really does overlap, your modeling should reflect that.
Finally, you can constrain the generalization relationship with a completeness constraint [Teorey, Yang, and Fry 1986; Teorey 1999]. If the subtypes of an entity are complete, it means that an object of that kind must be one of the subtypes, and that there are no additional (unknown) subtypes. This constraint lets you express the situation where your subtypes are a logical division of the supertype that exhaustively covers the possibilities. Descriptive objects such as identity documents are almost never complete. Usually, this constraint applies to divisions that you make to handle conceptual or abstract categorizations of the data, such as grouping organizations into the categories of criminal, legitimate, and "other." Having the "other" category makes this set of subtypes complete.
Figure 6-7: The Identification Subtype Hierarchy Using IDEF1X Notation
Aggregation links two entities in a relationship that is tighter than the usual relationship. The aggregation relationship is often called a "part-of" relationship, because it expresses the constraint that one entity is made up of several other entities. There is a semantic difference between being a part of something and something "having" something else [Booch 1994, pp. 64—65, 102, 128—129; Rumbaugh et al. 1992, pp. 57—61; Teorey 1999, p. 26].
The most common situation in which you find aggregation useful is the parts explosion, or more generally the idea of physical containment. A parts explosion is a structural situation where one object is made up of a set of other objects, all of the same type. The parts explosion thus represents a tree or graph of parts. Querying such a tree yields the transitive closure of the graph, the collection of all the parts that make up the tree.
Note
The parts explosion or transitive closure is a hard problem for relational databases. Some SQL dialects, such as Oracle SQL, have operators that do this kind of query (CONNECT BY), but most don't. As a result, SQL programmers generally spend a lot of time programming the retrieval of these structures. Identifying an aggregate thus tells you interesting things about the amount of work involved in your application, at least if you're using a standard relational database.
Holmes PLC is as much interested in parts as in wholes:
He passed close beside us, stole over to the window, and very softly and noiselessly raised it for half a foot. As he sank to the level of this opening, the light of the street, no longer dimmed by the dusty glass, fell full upon his face. The man seemed to be beside himself with excitement. His two eyes shone like stars, and his features were working convulsively. He was an elderly man, with a thin, projecting nose, a high, bald forehead, and a huge grizzled
moustache. An opera hat was pushed to the back of his head, and an evening dress shirt-front gleamed out through his open overcoat. His face was gaunt and swarthy, scored with deep, savage lines. In his hand he carried what appeared to be a stick, but as he laid it down upon the floor it gave a metallic clang. Then from the pocket of his overcoat he drew a bulky object, and he busied himself in some task which ended with a loud, sharp click, as if a spring or bolt had fallen into its place. Still kneeling upon the floor, he bent forward and threw all his weight and strength upon some lever, with the result that there came a long, whirling, grinding noise, ending once more in a powerful click. He straightened himself then, and I saw that what he held in his hand was a sort of a gun, with a curiously misshapen butt. He opened it at the breech, put something in, and snapped the breech-block. Then, crouching down, he rested the end of the barrel upon the ledge of the open window, and I saw his long moustache droop over the stock and his eye gleam as it peered along the sights. I heard a little sigh of satisfaction as he cuddled the butt into his shoulder, and saw that amazing target, the black man on the yellow ground, standing clear at the end of his foresight. For an instant he was rigid and motionless. Then his finger tightened on the trigger. There was a strange, loud whiz and a long, silvery tinkle of broken glass. At that instant, Holmes sprang like a tiger on to the marksman's back, and hurled him flat upon his face.
… Holmes had picked up the powerful air-gun from the floor, and was examining its mechanism.
"An admirable and unique weapon," said he, "noiseless and of tremendous power: I knew Von Herder, the blind German mechanic, who constructed it to the order of the late Professor Moriarty. For years I have been aware of its existence, though I have never before had the opportunity of handling it. I commend it very specially to your
attention, Lestrade, and also the bullets which fit it." [EMPT]
Figure 6-8 illustrates the basic parts explosion in IDEF1X notation. Each gun component is a part of another, with the top-level ones (the air-gun in the above quotation, for example) having a null relationship (no parent). This structure represents a forest of mechanical trees, a one-to-many relationship between mechanisms and parts. There is a case for making the relationship many-to-many, representing a forest of graphs or networks instead of hierarchical trees. The aggregation adornment is the diamond shape at one end of the relationship line. The relationship is strong, or "nonidentifying," so the entity gets a foreign key attribute that is not part of the primary key ("is part of") and a dashed line (see the later section "Strong and Weak Relationships" for more information about nonidentifying relationships). While aggregation can be interesting and useful, it seldom has many consequences for the underlying database design. The main consequence is to couple the design more tightly. When you design a system, you break the system into subsystems, uncoupling the elements as much as possible. This permits the reuse of the elements in different situations. When you identify an aggregate, you are usually saying that you can't separate the aggregate from the entity it aggregates. That generally means you can't break the two apart into separate subsystems. Often, genericity (templates or generics) provides a way to gain reuse while still allowing for the tight coupling of the entities (as instantiated templates or generics). As ER diagramming has no concept of genericity, however, the concept isn't of much value in ER aggregation situations.
Also, in OO design, aggregation is a kind of abstraction, just like inheritance. In this case, you are abstracting and encapsulating the aggregated objects under the aggregate. Usually, your aggregate will have operations that manipulate the encapsulated objects that make it up, hiding the structural details. For example, to search a tree, you create an operation that returns an iterator, which in turn lets you walk the tree in some well-defined order. In design terms, if you don't want this level of encapsulation, you shouldn't designate the relationship as an aggregate.
Figure 6-8: The Air Gun Parts Explosion
Note
You can see almost any component relationship as aggregation if you stretch the semantics enough. You can call the attributes of an entity an aggregation, for example: the entity is made up of the attributes. While this may be possible, it isn't useful for design. You should limit your use of aggregation to situations such as parts explosions or physical containment, where the result means something special for your design and code. This is especially true when it means something for your database structures, as in the case of the criminal organization tree.