Uniqueness, Scope, and Qualifiers - Data and Reality_ A Timeless Pe

Whether a name refers to one thing or many frequently depends on the set of candidates available to be referenced. This set of candidates comprises a “scope,” and it is often implicit in the environment in which the naming is done. A reference to “Harry” is often understood to mean the Harry present in the room. A letter addressed to Portland (without naming the state) will probably be delivered in Oregon if mailed on the West Coast, and in Maine if mailed on the East Coast. The boundaries of a scope, and the implicit default rules, are often fuzzy: I don't know

where the letter would go if it was mailed in Illinois.

Recall our earlier discussion on context. “What is the scope of uniqueness?” is a frequent question heard from the analyst or modeler. The broader the context, the greater the effort to integrate. If the Admissions department and Alumni Affairs department at a university each define “student” differently, modeling student just for the Admissions department is a much easier assignment than modeling student for the whole university including Alumni Affairs, as we would avoid the integration issues associated with multiple perceptions.

Qualification, the specification of additional terms in a name, is often used to resolve such ambiguities by making the intended scope more explicit. In this case, adding the state name would (partially) resolve the ambiguity.

Scopes are often nested, and we often employ a mixed convention: a larger scope is left implicit, but a sub-scope within it is explicitly specified. This is partial qualification. There are cities named San Jose in Costa Rica and in the United States. Let's imagine that the one in Costa Rica is within a “district” named California. Then the address “San Jose, California,” although qualified, is still ambiguous. Whether the letter gets to its intended destination depends on the “default scope” (i.e., country) implied by the point at which it is mailed.

Even the city name is a scope, resolving the ambiguity of a street address— University Avenue exists in many cities. And the street name selects a scope of house numbers. A complete address is a whole chain of scope qualifiers.

Telephone numbers provide familiar examples of qualification. A (7-digit) phone number is certainly not unique; it may exist within many different area codes. Here the boundaries of the scopes, and the default rules, are well defined. Incidentally, phone numbers illustrate some kinds of anomalies that may occur in real naming conventions:

Different forms of names are valid within different scopes: for local extensions, they are four digits; for outside numbers, they are seven digits plus an optional area code.

Form and content (syntax and semantics) are mixed together. You can't specify the naming rules independent of the numbers involved. Certain initial digits are reserved for certain functions. In the United States, if the first digit you dial is zero, then you are addressing the operator, not selecting a scope. Certain three-digit numbers are valid destinations, and not part of a seven- digit number (like 411 for information).

done: the phones at another location may have a different convention for getting outside lines, local extensions, etc.

DELIBERATE NON-UNIQUENESS

Quite often, things don't have individual unique names. This poses no problem when the things aren't individually represented in the system. In the case of parts, for example, we have one named representative for a type of part; the existence of individual instances is reflected only in the “quantity on hand” attribute.

Consider, however, something like a table of organization for a military unit. There may be several slots for clerks, with each slot having the same job description and skill requirements. We want them separately represented; they are the permanent entities in this structure. One of the attributes (or relationships) we want to record for them is the name of the person currently holding the position. When the positions are vacant, the information associated with the entities is identical. When we want to address one of them, e.g., to assign someone to a job, it is sufficient to refer to “any one of the vacant clerk positions.” For this kind of information, the entities do not require unique identification.

It is sometimes asserted that each entity represented in the system must have a unique identifier. I contend that this is a requirement imposed by a particular data model (and it may make many things easier to cope with), but it is not an inherent characteristic of information.

As Kent says, the need for a unique identifier is imposed on us by the database management systems in use today, as well as performance requirements for getting to individual records quickly. Part of our job of organizing information is determining what makes an instance unique within each entity type. If we're lucky, the part has a Part Number, the employee has a Social Security Number, and the order has an Order Number. However, what we frequently find is that theoretically, these are the correct attributes for uniqueness, yet the actual information yields exceptions due to typing errors, obscure business logic, or unavailability of the information.

On the other side of the spectrum, it is possible that there isn't a unique set of attributes for an entity type; in these situations, we often create what is termed a “virtual key.” A virtual key is an attribute that is created by the data modeler to ensure there is something unique within an entity type when there is nothing real by which to retrieve an entity instance. That is, real in the sense of a true piece of business information.

Quite often, the technique for giving something a unique qualified name is simply based on an arbitrary relationship to some other object. In effect, the scope becomes the set of things having a particular relationship to a particular object. Consider, for example, the naming of employees’ dependents by the two fields consisting of the employee identification number plus the dependent's first name (the example is taken from [Chen]). In order for such a convention to be effective, a number of conditions must be satisfied.

Uniqueness Within Qualifier

The relationship must confer uniqueness of simple name within relative (i.e., the employee must not have two dependents with the same simple name). Curiously enough, even this might not hold for the given example. A pathological case would occur if the employee had several children with the same name (or is that in fact plausible with adopted children? or after remarriage?). More reasonably, his wife and daughter might have the same name, or his father and son (and grandson, if he was an eligible dependent).

Kent might have been surprised when boxer George Foreman named all of his sons George Edward Foreman—Jr., II, III, IV, V, and VI.

Singularity of Qualifier

The relationship does not actually have to be one-to-many for naming purposes, so long as the previous constraint on uniqueness holds for each relative. Thus a person could be a dependent of several employees, and still be uniquely identifiable, so long as no employee has two dependents with the same first name.

However, this situation does give rise to synonyms: a given dependent could be identified by qualification by any of his related employees. This could lead to a number of problems, such as determining when two references to dependents were really references to the same person. And also: when a new employee lists his dependents, how shall we know if any of those dependents are already recorded as dependents of other employees? (Do we add new dependent records, or add synonyms to existing records?)

To avoid such problems, one could require that the identifier have no synonyms. Then dependents could no longer be identified via their related employees—unless we wanted to deny the reality that a person might be a dependent of several employees.

Existence of Qualifier

A qualifier must exist for each entity occurrence. Therefore the relationship must not be optional; each dependent must have a corresponding employee. If the benefits program were expanded, let's say as a charitable community service, to

cover needy people unrelated to any employee, then this system of entity identification would no longer work.

Invariance of Qualifiers

Such a relationship must really be invariant (unmodifiable). The relationship constitutes information that is redundantly scattered about everywhere that this entity is referenced, with the potential for enormous update anomalies if the information can change. (Qualified names thus violate the spirit, if not the letter, of relational third normal form [Codd 72], [Kent 73].) Even this requirement might not be satisfied by the example cited. For tax purposes, two married employees might wish to change which one of them claims which children as dependents; such a change would have to be propagated into the qualifiers in every single reference to those children.

In document Data and Reality_ A Timeless Pe - William Kent.pdf (Page 64-68)