• Dimensional level.Dimensional level. A dimensional level is one A dimensional level is one level within the hierarchy of a dimension,level within the hierarchy of a dimension, such as Month within the Calendar
such as Month within the Calendar dimension. Dimensional levels are used to facilitatedimension. Dimensional levels are used to facilitate calculating measures at the level(s) the business needs to see. They are not built based on calculating measures at the level(s) the business needs to see. They are not built based on how the business works.
how the business works.
•
• Dimensional attribute.Dimensional attribute. The properties within a dimension, such as Ice Cream Container The properties within a dimension, such as Ice Cream Container Height in the Ice Cream Container dimension.
Height in the Ice Cream Container dimension.
We've emphasized several times that relational modeling
We've emphasized several times that relational modeling captures how something works andcaptures how something works and dimensional captures what is being monitored. Let's look at this distinction in another way dimensional captures what is being monitored. Let's look at this distinction in another way -there are three main differences between relational and dimensional models: focus, lines, and there are three main differences between relational and dimensional models: focus, lines, and scope:
scope:
•
• Focus.Focus. A dimensional model is a da A dimensional model is a data model whose only purpose is to allow eta model whose only purpose is to allow efficient andfficient and user-friendly filtering, sorting, and summing of measures. A relational model, on
user-friendly filtering, sorting, and summing of measures. A relational model, on thethe other hand, focuses on supporting
other hand, focuses on supporting a business process. Dimensional models area business process. Dimensional models are
appropriate when there is a need to massage numbers, such as by summing or averaging.
appropriate when there is a need to massage numbers, such as by summing or averaging.
The reason the dimensional model shou
The reason the dimensional model should be limited to numbers is because ld be limited to numbers is because its designits design allows for easy navigation up an
allows for easy navigation up and down hierarchy levels. When traversing hierarchyd down hierarchy levels. When traversing hierarchy levels, measures may need to be recalculated for the hierarchy level. For example, a levels, measures may need to be recalculated for the hierarchy level. For example, a Gross Sales Amount
Gross Sales Amount of $5 on a particular date might be $100 for the month in which of $5 on a particular date might be $100 for the month in which that date belongs.
that date belongs.
•
• Lines.Lines. The relationship lines on a dimensional model represent navigation paths instead The relationship lines on a dimensional model represent navigation paths instead of business rules, as in a relational model. Let creativity drive your dimensional
of business rules, as in a relational model. Let creativity drive your dimensional structures. The relationships in a dimensional hierarchy do not have to mimic their structures. The relationships in a dimensional hierarchy do not have to mimic their relational counterparts. You should build dimensions to meet
relational counterparts. You should build dimensions to meet the way the business usersthe way the business users think.
think.
•
• Scope.Scope. The scope of a dimensional model is a collection of related measures that together The scope of a dimensional model is a collection of related measures that together address a business concern, whereas in a relational model, the scope may be a broad address a business concern, whereas in a relational model, the scope may be a broad business process, such as order processing or account management. For example, the business process, such as order processing or account management. For example, the
metrics
metrics Number of Product ComplaintsNumber of Product Complaints and and Number of Product InquiriesNumber of Product Inquiries can be can be used to gauge product satisfaction.
used to gauge product satisfaction.
Normalization Explained Normalization Explained
When I turned 12, I received a trunk full of baseball cards as a birthday present from my parents.
When I turned 12, I received a trunk full of baseball cards as a birthday present from my parents.
I was delighted, not just because there may have been a Hank Aaron or Pete Rose buried I was delighted, not just because there may have been a Hank Aaron or Pete Rose buried somewhere in that trunk, but because I loved to organize the cards. I categorized each card somewhere in that trunk, but because I loved to organize the cards. I categorized each card according to year and team. Organizing the cards in this way gave me a deep understanding of according to year and team. Organizing the cards in this way gave me a deep understanding of the players and their teams. To
the players and their teams. To this day, I can answer many this day, I can answer many baseball card trivia questions.baseball card trivia questions.
Normalization, in general, is the process of applying a set of rules with the goal of organizing Normalization, in general, is the process of applying a set of rules with the goal of organizing
something
something. I was normalizing the baseball cards according to year and team. We can also apply a. I was normalizing the baseball cards according to year and team. We can also apply a set of rules and normalize the d
set of rules and normalize the data elements within our organizations. The rules are ata elements within our organizations. The rules are based onbased on how the business works, which is why
how the business works, which is why normalization is the primary technique used in buildingnormalization is the primary technique used in building the relational logical data model.
the relational logical data model.
Just as those baseball cards lay unsorted in that
Just as those baseball cards lay unsorted in that trunk, our companies have hugtrunk, our companies have huge numbers of datae numbers of data elements spread throughout departments and applications. The rules applied to normalizing the elements spread throughout departments and applications. The rules applied to normalizing the baseball cards entailed first sorting by year, and then by team within a year. The rules for baseball cards entailed first sorting by year, and then by team within a year. The rules for
normalizing our data elements can be boiled down to a single sentence:
normalizing our data elements can be boiled down to a single sentence:
Make sure every data element is single-valued and provides a fact completely and only about its Make sure every data element is single-valued and provides a fact completely and only about its primary key. The underlined terms require more of an explanation.
primary key. The underlined terms require more of an explanation.
'Single-valued' means a data e
'Single-valued' means a data element must contain only one piece lement must contain only one piece of information. Ifof information. If ConsumerConsumer Name
Name contains contains Consumer First NameConsumer First Name and and Consumer Last Name,Consumer Last Name, for example, we must split for example, we must split Consumer Name
Consumer Name into two data elements - into two data elements - Consumer First NameConsumer First Name and and Consumer Last NameConsumer Last Name..
'Provides a fact' means that a g
'Provides a fact' means that a given primary key value will always return niven primary key value will always return no more than one ofo more than one of every data element that is identified by this key. If a
every data element that is identified by this key. If a Customer IdentifierCustomer Identifier value of '123' for value of '123' for example, returns three customer last names ('Smith', 'Jones', and
example, returns three customer last names ('Smith', 'Jones', and 'Roberts'), this violates this part'Roberts'), this violates this part of the normalization definition.
of the normalization definition.
'Completely' means that the minimal set of data
'Completely' means that the minimal set of data elements that uniquely identify an instance of theelements that uniquely identify an instance of the entity is present in the primary key. If, for e
entity is present in the primary key. If, for example, there are two data xample, there are two data elements in an entity'selements in an entity's primary key, but only one is needed for uniqueness, the data element that is not needed for primary key, but only one is needed for uniqueness, the data element that is not needed for
uniqueness should be removed from the primary key.
uniqueness should be removed from the primary key.
'Only' means that each data element must provide a fact about the primary key and nothing else.
'Only' means that each data element must provide a fact about the primary key and nothing else.
That is, there can be no hidden dependencies. For example, assume an Order is identified by an That is, there can be no hidden dependencies. For example, assume an Order is identified by an Order Number
Order Number. Within Order, there are many data . Within Order, there are many data elements, includingelements, including Order ScheduledOrder Scheduled Delivery Date, Order Actual Delivery Date
Delivery Date, Order Actual Delivery Date , and, and Order On Time Indicator.Order On Time Indicator. Order On TimeOrder On Time Indicator
Indicator contains either a 'Yes' or a 'No', providing a fact about whether the contains either a 'Yes' or a 'No', providing a fact about whether the Order ActualOrder Actual Delivery Date
Delivery Date is less than or equal to the is less than or equal to the Order Scheduled Delivery Date. Order On TimeOrder Scheduled Delivery Date. Order On Time Indicator,
Indicator, therefore, provides a fact about therefore, provides a fact about Order Actual Delivery DateOrder Actual Delivery Date and and Order ScheduledOrder Scheduled Delivery Date
Delivery Date, not directly about, not directly about Order NumberOrder Number, so it should be removed from the normalized, so it should be removed from the normalized model.
model. Order On Time IndicatorOrder On Time Indicator is an example of a derived data element, meaning it is is an example of a derived data element, meaning it is calculated. Derived data elements are removed from a normalized model.
calculated. Derived data elements are removed from a normalized model.
So, a general definition for normalization is that it is a series of rules for organizing something.
So, a general definition for normalization is that it is a series of rules for organizing something.
The series of rules can be summarized as:
The series of rules can be summarized as: Every dat Every data element is sa element is single-valuingle-valued and proed and provides avides a fact comple
fact completely and ontely and only about its ly about its primary keyprimary key. An informal definition I frequently use for. An informal definition I frequently use for normalizing is:
normalizing is: A formal pro A formal process of askcess of asking busineing business questioss questionsns . We cannot determine if every data. We cannot determine if every data element is single-valued and provides a fact completely and only about its primary key unless we element is single-valued and provides a fact completely and only about its primary key unless we understand the data. To understand the data we usually need to ask lots of questions. Even for an understand the data. To understand the data we usually need to ask lots of questions. Even for an
apparently simple data element such as
apparently simple data element such as Phone Number,Phone Number, for example, we can ask many for example, we can ask many questions:
questions:
•
• Whose phone number is this?Whose phone number is this?
•
• Do you always have to have a phone number?Do you always have to have a phone number?
•
• Can you have more than one phone number?Can you have more than one phone number?
•
• Do you ever recognize the area code as separate from the rest of the phone number?Do you ever recognize the area code as separate from the rest of the phone number?
•
• Do you ever need to see phone numbers outside a given country?Do you ever need to see phone numbers outside a given country?
•
• What type of phone nWhat type of phone number is this? That is, is it a fax number, mobile umber is this? That is, is it a fax number, mobile number, etc.?number, etc.?
To ensure that every data element is single-valued and provides a fact completely and only about To ensure that every data element is single-valued and provides a fact completely and only about its primary key, we apply a series of
its primary key, we apply a series of rules or small steps, where each step (or level ofrules or small steps, where each step (or level of normalization) checks something that moves us towards ou
normalization) checks something that moves us towards our goal. Most data professionals wouldr goal. Most data professionals would agree that the full set of normalization levels is the following:
agree that the full set of normalization levels is the following:
•
• first normal form (1NF)first normal form (1NF)
•
• second normal form (2NF)second normal form (2NF)
•
• third normal form (3NF)third normal form (3NF)
•
• Boyce/Codd normal form (BCNF)Boyce/Codd normal form (BCNF)
•
• fourth normal form (4NF)fourth normal form (4NF)
•
• fifth normal form (5NF)fifth normal form (5NF)
Each level of normalization includes the lower levels of rules that precede it. If a model is in Each level of normalization includes the lower levels of rules that precede it. If a model is in 5NF, it is also in 4NF, BCNF, and so on. Even though there are higher levels of normalization 5NF, it is also in 4NF, BCNF, and so on. Even though there are higher levels of normalization than 3NF, many interpret the term "normalized" to mean 3NF. This is because the higher levels than 3NF, many interpret the term "normalized" to mean 3NF. This is because the higher levels of normalization (that is, BCNF, 4NF, and
of normalization (that is, BCNF, 4NF, and 5NF) cover specific situations that occur much less5NF) cover specific situations that occur much less frequently than the first three levels. Therefore, to keep
frequently than the first three levels. Therefore, to keep things simple, this chapter focuses onlythings simple, this chapter focuses only on first through third normal forms.
on first through third normal forms.
Normalization provides a number of important benefits: (NOTE: In this
Normalization provides a number of important benefits: (NOTE: In this section, the term datasection, the term data model may also represent the physical database that will be implemented from the data model) model may also represent the physical database that will be implemented from the data model)
•
• Stronger understanding of the business.Stronger understanding of the business. The process of normalization ensures that we The process of normalization ensures that we ask many questions about the data elements so that we know we are assigning them to ask many questions about the data elements so that we know we are assigning them to entities correctly. The answers to our questions give
entities correctly. The answers to our questions give us insight into how things work.us insight into how things work.
Recently, for example, I normalized
Recently, for example, I normalized a Claims and Provider file for a a Claims and Provider file for a medium-sizedmedium-sized insurance company. I ke
insurance company. I kept a record of my questions and pt a record of my questions and their responses to them. Aftertheir responses to them. After normalization was complete, I realized I had asked over 100 questions! I learned quite a normalization was complete, I realized I had asked over 100 questions! I learned quite a bit about claims and providers from this process.
bit about claims and providers from this process.
•
• Greater application stability.Greater application stability. Normalization leads to a model that mimics the Normalization leads to a model that mimics the way inway in which the business works. As the business goes ab
which the business works. As the business goes about its daily operations, the applicationout its daily operations, the application receives data according to the rules that govern the business. The normalized model receives data according to the rules that govern the business. The normalized model developed for the application leads to a database that has been structured to match these developed for the application leads to a database that has been structured to match these rules. Therefore, the system runs smoothly as long a
rules. Therefore, the system runs smoothly as long as the rules in the business match thes the rules in the business match the rules documented throughout the no
rules documented throughout the normalization process. If, for example, more than onermalization process. If, for example, more than one
person can own one or more accounts, the model—and therefore the application—will person can own one or more accounts, the model—and therefore the application—will
accommodate this fact.
accommodate this fact.
•
• Less data redundancy.Less data redundancy. Each level of normalization removes a type of Each level of normalization removes a type ofdata redundancydata redundancy from the model. Data redundancy occurs when the same information appears more than from the model. Data redundancy occurs when the same information appears more than once in the same model. As redundancy is removed, changing the data becomes a quicker once in the same model. As redundancy is removed, changing the data becomes a quicker process, because there is less of it to update or insert. If
process, because there is less of it to update or insert. If PersonPerson Last NameLast Name appears once appears once on a model and Mary's last name changes, the application only has to update her last on a model and Mary's last name changes, the application only has to update her last name in one place.
name in one place.
•
• Better data quality.Better data quality. By reducing redundancy and enforcing business rules through By reducing redundancy and enforcing business rules through relationships, the data are less likely to get out of synch or violate these business rules. If relationships, the data are less likely to get out of synch or violate these business rules. If an account must have at least one account owner, the data model can prevent accounts an account must have at least one account owner, the data model can prevent accounts with invalid or missing account owners from occurring.
with invalid or missing account owners from occurring.
•
• Faster building of new models.Faster building of new models. A degree of common sense is applied to the place to A degree of common sense is applied to the place to which data elements are assigned d
which data elements are assigned during the normalization process. Therefore, it becomesuring the normalization process. Therefore, it becomes
which data elements are assigned during the normalization process. Therefore, it becomesuring the normalization process. Therefore, it becomes