Much of the information in an information system is about relationships. However, most data models do not provide a direct way to describe such relationships, but provide instead a variety of representational techniques (record formats, data structures). Implicit in most of these, and in the accompanying restrictions in the data processing system, is the ability to support some forms of relationships very well, some rather clumsily, and some not at all.
In order to assess the capabilities of a data model, it would help to have some systematic understanding of the various forms of relationships that can occur in real information. In the next few paragraphs I will discuss some significant characteristics of relationships. A particular “form” of a relationship is then some combination of these characteristics. A method for assessing a data model would include a determination of which forms it supported well, poorly, or not at all. Note the emphasis on combinations. In most data models you can probably manage to find a way to obtain most of the following features, taken one at a time. The challenge is to support relationships having various combinations of these features. By “support,” I mean that
the system somehow permits a constraint to be asserted for the relationship (e.g., that it is one-to-many), and
the system thereafter enforces the constraint (e.g., will not allow the recording of an employee's assignment to more than one department at a time).
Such support is often implicit in the data structure (e.g., hierarchy), rather than being declared explicitly.
Note that Kent's use of the term “support” equates to today's term of “referential I integrity.”
The set of characteristics listed below is probably incomplete—I imagine it will always be possible to think of additional relevant criteria. For simplicity, we are now only considering “binary” relationships, i.e., those of degree two. Most of the concepts can be readily generalized to “n-ary” relationships (those of any degree).
COMPLEXITY
Relationships might be one-to-one (departments and managers, monogamous husbands and wives), one-to-many (departments and employees), or many-to-many (students and classes, parts and warehouses, parts assemblies). The relationship between employees and their current departments is (typically) one-to-many, whereas the relationship between employees and all the departments they have worked in (as recorded in personnel history files) is many-to-many.
Another way to characterize complexity is to describe each direction of the relationship separately as simple (mapping one element to one) or complex (mapping one element to many). The terms “singular” and “multiple” are also used. Thus “manager of department” is simple in both directions; “manager of employee” is simple in one direction and complex in the other. Relative to the number of “forms” of relationships, this would count as four possibilities, since a given relationship might be simple or complex in each direction.
One advantage to this latter view is that it corresponds well with certain aspects of data extraction. Very often a relationship is being traversed in one direction (e.g., find the department of a given employee); the data processing system usually has to anticipate whether the result will contain one element or many (e.g., whether an employee might be in more than one department). The complexity of the reverse direction is of little concern (i.e., whether or not there are also other employees in the department).
Thus, if a given direction is complex, it doesn't matter much whether the relationship is 1:n or m:n. If the direction is simple, the distinction between n:1 and 1:1 may be immaterial.
It's amusing to note that the relationship between postal zip codes and states in the United States is almost many-to-one, so that the zip code directory is organized hierarchically as zip codes within states. The relationship is really many-to-many, but there are only about four zip codes that actually span state boundaries. The post office copes with that by listing the exceptions at the front of the directory.
CATEGORY CONSTRAINTS
Either side of a binary relationship might be constrained to a single category, constrained to any of several specified categories, or unconstrained (three possibilities on each side, for a total of nine combinations). Constraint to a single category is probably the most common situation, as in the examples above under “Complexity.”
Constraint to a set of categories occurs, for example, when a person can “own” things in several different categories, or when the owner might be a person,
department, division, company, agency, or school. This case might be avoided by defining one new category as the union of the others—if you're dealing with a data model which permits overlapping categories.
Also known as subtyping, this is when you define a generic concept called a supertype that contains all of the common properties of other entity types called subtypes. For example, the generic concept of Event might contain the common properties of different types of events such as Order, Return, and Shipment. Event is considered the supertype and Order, Return, and Shipment are considered subtypes.
It is hard to think of a relationship that is naturally unconstrained as to category (i.e., one that applies to every kind of thing), but it often makes sense to handle a relationship that way in a real data processing system. Perhaps the relationship does happen to apply to all of the things represented in this particular database, or to so many of them that it isn't worth checking for the few exceptions. Perhaps the installation doesn't want to incur the overhead of enforcing the constraint, and trusts the applications to assert only sensible relationships. Or, the system simply may not provide any mechanism for asserting and enforcing such constraints.
SELF-RELATION
Three possibilities:
1. The relationship is not meaningful between things in the same category.
2. Things in the same category may be so related, but a thing may not be related to itself.
3. Things may be related to themselves.
The first case is again probably the most common. The second occurs, for example, in organization charts and parts assemblies. Examples of the third are our representatives in government (the representative is one of his own constituents), and canvassers for fund drives (the canvasser collects from himself).
Incidentally, I am thinking here of the simple case where categories are mutually exclusive. When categories overlap, as in subsets, things may be more complicated.
Self-relation is described on the data model through a recursive relation, which is a relationship that starts and ends at the same entity type. Recursive relationships allow for a lot of flexibility but come with the high price of reducing model readability and obscuring business rules.
For example, in the following figure, we see two ways of modeling a sales organization:
The entity Sales Organization Level related to itself (left) produces an extremely flexible structure, as we can have any number of levels and even gracefully change the rule that instead of a Zone containing Territories, a Territory now contains Zones. However, the cost for flexibility is often obscurity, as recursion hides business rules and makes it a more challenging communication tool. On the right, though, is the model without recursion. This model shows the four levels clearly. But if there is a fifth level, it would require effort to fix the model and update the resulting database and code.
OPTIONALITY
On either side of the binary relationship, the relationship might be optional (not everybody is married) or mandatory (every employee must have a department). I will count this as four combinations (two possibilities on each side), although there could conceivably be more: one of the domains may include several categories, with the relationship being optional in some categories and mandatory in others.
THE NUMBER OF FORMS
Even with this limited list of characteristics, we already have 432 forms (4 x 9 x 3 x 4). This number might include some symmetries, duplicates, and meaningless combinations, but after subtracting these we still have a sizable checklist.