• No results found

Applying the Normalization Process

The normalization process is applied to each user view collected during earlier design stages. Some people find it easier to apply the first step (choosing a primary key) to each user view, then the next step (converting to first normal form), and so forth. Other

148

Databases Demystified

Color profile: Generic CMYK printer profile Composite Default screen

people prefer to take the first user view and apply all the normalization steps to it, then the next user view, and so forth. With practice, you’ll know which one works best for you, but whichever you do, you must beverysystematic in your approach, lest you miss something. Our example has only one user view (the Acme Industries invoice), so this may seem a moot point, but there are two practice problems toward the end of the chapter containing several user views each, so you will be able to try this out soon enough. Using dry-erase markers or chalk on a wall-mounted board is most helpful because you can easily erase and rewrite relations as you go.

We start with each user view being a relation, which means we represent it as if it is a two-dimensional table. As you work through the normalization process, you will be rewriting existing relations and creating new ones. Some find it useful to draw the relations with sample tuples (rows) of data in them to assist in visualizing the work. If you take this approach, be certain that your data represents real-world situations. For example, you might not think of two customers having exactly the same name in our invoice example, so then your normalization results might be incorrect. There- fore,alwaysthink of as many possibilities as you can when using this approach. Fig- ure 6-3 shows the information from our invoice example (Figure 6-2) represented in tabular form. Only one invoice is shown here, but many more could be filled in to show examples of multiple invoices per customer, multiple customers, the same product on multiple invoices, and so on.

You probably noticed that each invoice has many line items. This will be essential information when we get to first normal form. In Figure 6-3, multiple values are placed in the cells for the columns that hold data from the line items. We call these

CHAPTER 6

Logical Database Design Using Normalization

149

Figure 6-3 Acme Industries invoice represented in tabular form Color profile: Generic CMYK printer profile

multivalued attributesbecause they have multiple values for at least some tuples (rows) in the relation. If we were to construct an actual database table in this manner, our ability to use a language such as SQL to query those columns would be very lim- ited. For example, finding all orders that contained a particular product would re- quire us to parse the column data with a LIKE operator. Updates would be equally awkward because SQL was not designed to handle multivalued columns. Worst of all, a delete of one product from an invoice would require an SQL UPDATE instead of a DELETE because we would not want to delete the entire invoice. As we look at first normal form later in this chapter, you will see how to work around this problem.

Figure 6-4 shows another way we could organize a relation using the invoice shown in Figure 6-2. Here, the multivalued column data has been placed in separate rows and the other columns’ data has been repeated to match. The obvious problem here is all the repeated data. For example, the customer’s name and address are re- peated for each line item on the invoice, which is not only wasteful of resources, but also exposes us to inconsistencies whenever the data is not maintained in the same way (for example, we update the city for one line item but not all the others).

Rewriting user views into tables with representative data is a tedious and time- consuming process. For this reason, we’ll simply write the attributes as a list and visualize them in our minds as two-dimensional tables. This takes some practice and some training of the mind, but once mastered, speeds your ability to normalize rela- tions several fold over writing out exhaustive examples. Here is the list for the in- voice example from Figure 6-2:

150

Databases Demystified

Figure 6-4 Acme Invoice represented without multivalued attributes Color profile: Generic CMYK printer profile

CHAPTER 6

Logical Database Design Using Normalization

151

Demystified / Databases Demystified / Oppel/ 225364-9 / Chapter 6

INVOICE: Customer Number, Customer Name, Customer Address, Customer City, Customer State, Customer Zip Code, Customer Phone, Terms, Ship Via, Order Date, Product Number, Product Description, Quantity, Unit Price, Extended Amount, Total Order Amount For clarity, a name for the relation has been added, with the relation name in all capital letters and separated from the attributes with a colon. This is the convention we will use for the remainder of this chapter. However, if another technique works better for you, by all means use it. The best news of all is that no matter which repre- sentation we use (Figure 6-3, Figure 6-4, or the preceding list), if we properly apply the normalization process and its rules, we will arrive at the same database design.