Part III: Relational SQL
LOCATION *CODE
*START_DATE PARENT_CODE PARENT_START_DATE NAME END_DATE
And here is the definition of the person location intersection entity: PERSON_LOCATION *ID *ID_TYPE CODE LOCATION_START_DATE *START_DATE END_DATE
The first two attributes in the PERSON_LOCATION entity, ID and ID_TYPE, represent the primary key of the person table. The next two attributes, CODE and LOCATION_START_DATE, represent the primary key of the location entity. These attributes are called foreign keys, because they point to the primary key of other entities. The primary key of the PERSON_LOCATION entity consists of the primary key from the person entity plus an additional START_DATE (see the fifth column). It is not necessary to include the location entity's primary key in the primary key
definition for the intersection, because the person's ID and type, along with the start date of the assignment, make each intersection entry unique. Also, not including the location's primary key enforces a business rule, which prevents a person from being represented as working in more than one place at a time.
8.1.3 Determining Relationships Between Entities
Although I've not talked about them directly, I`ve been thinking about the relationships between the entities all along. It's hard not to. In the introductory paragraph, I stated that a person works at a location, in a job, for an organization, and is either an employee or contractor. This statement defined four relationships. When I thought more about it, I decided I needed four intersection entities, one each between the PERSON entity and the other four entities: LOCATION, POSITION, ORGANIZATION, and STATUS. This is because I will keep a history, not just the current value, of each relationship. Each intersection entity actually represents two relationships, for a total of eight. There are also the 2 hierarchical relationships, so at this point I'm aware of the following 10 relationships: • PERSON to PERSON_LOCATION • LOCATION to PERSON_LOCATION • PERSON to PERSON_POSITION • POSITION to PERSON_POSITION • PERSON to PERSON_ORGANIZATION • ORGANIZATION to PERSON_ORGANIZATION • PERSON to PERSON_STATUS
• STATUS to PERSON_STATUS • ORGANIZATION to ORGANIZATION • LOCATION to LOCATION
All that's left to consider is what is called cardinality. Cardinality refers to the number of occurrences of any one entity that can point to occurrences of another, related, entity. For example, zero or more persons can have zero or more person location assignments. And zero or more locations can be assigned to zero or more person location assignments. Cardinality is important because it refines primary key definitions and defines business rules.
In practice, you may end up determining relationships before you identify attributes and primary keys, but analysis is an iterative process, so which comes first is not that important. What is important is that you test your analysis against examples of real-world data so you can uncover any flaws before you start creating any DDL.
8.2 Refining the Analysis
The use of real-world information in the primary key, as we just covered, is what I call a smart key solution. A smart key is a key composed of real-world data values. This is how most entity- relationship analysis was done in the 1980s. We, the programming community at the time, identified a set of entities that organized and described how information was used and how it related to the real world. We used real-world data values as the primary keys for our tables. But this technique of using real-world information to uniquely identify entries was flawed. As with all things, analysts gained experience over time, and with hard-earned experience, learned a better way to define an entity's primary key.
8.2.1 Defining Dumb Primary Keys
Here's what we learned. We discovered two flaws when using real-world information in a primary key. First, over time, the users of the applications we built no longer wanted to uniquely identify an entry by the real-world information that had been used. Second, they sometimes wanted to rename the real-world values used in a primary key. Since real-world information was used in primary keys, and therefore was referenced in foreign keys, it was not possible to change this real-world information without a major migration of the data in the database. If we changed a primary key in a row of one table, we had to change it in all the rows in related tables. Sometimes, this also led to major modifications to our applications.
The solution to this problem was to use dumb primary keys. Dumb primary keys consist of just a single numeric attribute. This attribute is assigned a unique value by the database whenever a new entry is created for an entity. With Oracle, a type of schema element known as a sequence can generate unique primary keys for primary entities such as PERSON and LOCATION. Dumb primary keys are then used to establish the relationship between entities, while a unique index is created against the former smart primary key attributes to create a unique key against real-world information. In effect, I end up with both internal (dumb) and external (smart) primary keys. Employing this technique of using dumb keys, reworking our person entity, and adding a dumb key attribute called PERSON_ID, I get the following new definition for the person entity: PERSON *PERSON_ID ID ID_TYPE LAST_NAME FIRST_NAME BIRTH_DATE
MOTHERS_MAIDEN_NAME
Now the person entity has one attribute that defines an entry's uniqueness. This attribute is