Introduction to normalization
Lecture 4
Instructor Anna Sidorova
Introduction to normalization
Agenda
Presentation
Review of relational models, in class exersise Introduction to normalization
In-class exercises Discussion of HW2
Next class
Review for the midterm HW 2 is due
March 7 – midterm exam
Based on Hoffer, Prescott and Topi Modern Database Management, (c) Prentice Hall 2009
HW 2
Chapter 4, Problem 6 – develop a relational schema
Convert ERD for Ch. 2, Problem 20 (a part of your HW1) , ( p y )
into a relational schema (must be based on the correct solution)
Chapter 4, Problems 7 and 8 (we will discuss the relevant
material next class)
Handout – normalization exercises
Based on Hoffer, Prescott and Topi Modern Database Management, (c) Prentice Hall 2009
Review of relational data models
Review of relational data models
Different people have different
views of the database these
Figure 2-7 Three-schema architecture
database…these are the external
schema The internal schema is the d l i underlying design and implementation
Relation
Relation
Definition: A relation is a named, twoDefinition: A relation is a named, two--dimensional table of data dimensional table of data
Table consists of rows (records) and columns (attribute or field)Table consists of rows (records) and columns (attribute or field)
Requirements for a table to qualify as a relation:Requirements for a table to qualify as a relation:
It must have a unique nameIt must have a unique name
Every attribute value must be atomic (not Every attribute value must be atomic (not multivaluedmultivalued, not composite), not composite)
Every row must be unique (can’t have two rows with exactly the same values for all Every row must be unique (can’t have two rows with exactly the same values for all their fields)
their fields)
Attributes (columns) in tables must have unique namesAttributes (columns) in tables must have unique names
The order of the columns must be irrelevantThe order of the columns must be irrelevant
The order of the rows must be irrelevantThe order of the rows must be irrelevant
NOTE: all relations are in 1stNormal form
Translating ERD into relational schema
Map each entity into a relation Map each weak entity into a relation (include the identifier of the strong entity t f th i k )
as a part of the primary key)
Map each multivalued attribute into a relation (include the identifier of the
entity as a part of the primary key)
Map many-to-many relationships and associative entities into a relation
Represent one-to-one and one-to-many relationships using foreign keys.
Normalization
Normalization
Learning Objectives
Define NormalizationDefine NormalizationDefine 1st, 2ndand 3rdNormal Forms
Normalization: Definitions
Normalization is a method used to validate and improve a logical
design so that it satisfies certain constraints that avoid unnecessary duplication of data
The process of decomposing relations with anomalies to produce
smaller, well-structured relations
9.11 9.11
Well
Well--Structured Relations
Structured Relations
A relation that contains minimal data redundancy and allows users A relation that contains minimal data redundancy and allows users
to insert, delete, and update rows without causing data to insert, delete, and update rows without causing data to se t, e ete, a up ate ows w t out caus g ata to se t, e ete, a up ate ows w t out caus g ata inconsistencies
inconsistencies
Goal is to avoid anomaliesGoal is to avoid anomalies
Insertion AnomalyInsertion Anomaly––adding new rows forces user to create duplicate dataadding new rows forces user to create duplicate data
Deletion AnomalyDeletion Anomaly––deleting rows may cause a loss of data that would be deleting rows may cause a loss of data that would be needed for other future rows
needed for other future rows
Modification AnomalyModification Anomaly––changing data in a row forces changes to other changing data in a row forces changes to other rows because of duplication
rows because of duplication
12
Example
Example–
–Figure 5
Figure 5--2b
2b
13
Question–Is this a relation? Answer–Yes: Unique rows and no
multivalued attributes
Question–What’s the primary key? Answer–Composite: Emp_ID,
Course_Title
Anomalies in this Table
Anomalies in this Table
InsertionInsertion––can’t enter a new employee without having the employee take a can’t enter a new employee without having the employee take a
class class class class
DeletionDeletion––if we remove employee 140, we lose information about the if we remove employee 140, we lose information about the
existence of a Tax Acc class existence of a Tax Acc class
ModificationModification––giving a salary increase to employee 100 forces us to giving a salary increase to employee 100 forces us to
update multiple records update multiple records
Wh d h li i ?
Why do these anomalies exist?
Because there are two themes (entity types) in this one relation. This results in data duplication and an unnecessary
Normalization Process
The goal is to bring each relation into the Third Normal
Form.
The process bringing a relation into the 3rdNormal Form
Goes through stages. 1stNormal Form 2ndNormal Form
3rdNormal Form
Functional Dependencies
Functional Dependency A particular relationship between two attributes. For a given p p g
relation, attribute B is functionally dependent on attribute A if, for every valid value of A, that value of A uniquely determines the value of B
Instances (or sample data) in a relation do not prove the
existence of a functional dependency
Knowledge of problem domain is most reliable method for g p
identifying functional dependency
9.16 9.16
Functional Dependencies: Notations in
Problems
A B Attribute B is functionally dependent on attribute A (A determines B) A, B C Attributes A and B together
determine attribute C
A B, C Both attributes, B and C are determined by (functionally determined by (functionally dependent on) attribute A
Functional Dependencies
We can draw functional dependencies between attributes of
l f ll a relation as follows:
STUDENT
Stud_ID F_Name L_Name E-mail
111 Mary Jones [email protected]
Important Definitions
Multivalued Attributes (repeating groups) – non-key
attributes or groups of non-key attributes the values of which are l d f d b d l d l
not uniquely identified by (directly or indirectly) (not
functionally dependent on) the value of the Primary Key (or its part).
STUDENT
Stud_ID Name Course_ID Units
STUDENT
Stud_ID Name Course_ID Units
101 Lennon MSI 250 3.00 101 Lennon MSI 415 3.00 125 Jonson MSI 331 3.00 101 Lennon MSI 250 3.00 MSI 415 3.00 125 Jonson MSI 331 3.00
Important Definitions
A relation is unnormalized (not in the 1stNormal Form) if
h l l d b
STUDENT
Stud ID Name Course ID Units it has multivalued attributes or repeating groups. Repeating
Group
Stud_ID Name Course_ID Units
101 Lennon MSI 250 3.00 101 Lennon MSI 415 3.00 125 Jonson MSI 331 3.00
Important Definitions
A relation is in the 1stNormal Form if it has no multivalued
b
STUDENT
Stud ID Name Course ID Units attributes or repeating groups.
Stud_ID Name Course_ID Units
101 Lennon MSI 250 3.00 101 Lennon MSI 415 3.00 125 Jonson MSI 331 3.00
Important Definitions
Partial Dependency – when an non-key attribute is
d d b b h h l f COMPOSITE
determined by a part, but not the whole, of a COMPOSITE primary key.
CUSTOMER
Cust ID Name Order ID
Partial Dependency
Cust_ID Name Order_ID
101 AT&T 1234
101 AT&T 156
Important Definitions
A relation is NOT in the 2ndNormal Form if it has partial
d d
dependencies.
CUSTOMER
Cust_ID Name Order_ID
Partial Dependency 101 AT&T 1234 101 AT&T 156 125 Cisco 1250
Important Definitions
A relation is in the 2ndNormal Form if it is in the 1stNormal
h l d d
Form AND has no partial dependencies.
EMPLOYEE
E ID F N L N D t ID D t N
Emp_ID F_Name L_Name Dept_ID Dept_Name
111 Mary Jones 1 Acct
Important Definitions
Transitive Dependency – when a non-key attribute
d h k b
determines another non-key attribute.
EMPLOYEE
E ID F N L N D t ID D t N
Transitive Dependency
Emp_ID F_Name L_Name Dept_ID Dept_Name
111 Mary Jones 1 Acct
122 Sara Smith 2 Mktg
Important Definitions
A relation is NOT in the 3rdNormal Form if it has
transitive dependencies.
EMPLOYEE
E ID F N L N D t ID D t N
Transitive Dependency
Emp_ID F_Name L_Name Dept_ID Dept_Name
111 Mary Jones 1 Acct
Important Definitions
A relation is in the 3rdNormal Form if it is in the 2nd
Normal Form and has no transitive dependencies.
EMPLOYEE
E ID F N L N D t ID
Emp_ID F_Name L_Name Dept_ID
111 Mary Jones 1
122 Sara Smith 2
Normal Forms: Review
Unnormalized –There are multivalued attributes or
repeating groups
1 NF – No multivalued attributes or repeating groups. 2 NF – 1 NF plus no partial dependencies
3 NF – 2 NF plus no transitive dependencies
9.28 9.28
Example 1: Determine NF
ISBN Title ISBN Publisher
All attributes are directly or indirectly determined ISBN Publisher
Publisher Address
BOOK
ISBN Title Publisher Address
by the primary key; therefore, the relation is
at least in 1 NF
ISBN Title Publisher Address
Example 1: Determine NF
ISBN Title ISBN Publisher
The relation is at least in 1NF. There is no COMPOSITE ISBN Publisher
Publisher Address
BOOK
ISBN Title Publisher Address
There is no COMPOSITE primary key, therefore there can’t be partial dependencies.
Therefore, the relation is at least in 2NF
Example 1: Determine NF
ISBN Title ISBN Publisher
Publisher is a non-key attribute, and it determines Address, ISBN Publisher
Publisher Address
BOOK
ISBN Title Publisher Address
another non-key attribute. Therefore, there is a transitive dependency, which means that
the relation is NOT in 3 NF.
ISBN Title Publisher Address
Example 1: Determine NF
ISBN Title ISBN Publisher
We know that the relation is at least in 2NF, and it is not in 3 ISBN Publisher
Publisher Address
BOOK
ISBN Title Publisher Address
NF. Therefore, we conclude that the relation is in 2NF.
Example 1: Determine NF
ISBN TitleISBN P bl h
In your solution you will write the following justification: ISBN Publisher Publisher Address following justification: 1) No M/V attributes, therefore at least 1NF 2) No partial dependencies, therefore at least 2NF 3) There is a transitive dependency
(Publisher Address), therefore,
not 3NF
Conclusion: The relation is in 2NF BOOK
ISBN Title Publisher Address
Example 2: Determine NF
Product_ID Description
ORDER
All attributes are directly or indirectly determined by the primary key; therefore, the relation
is at least in 1 NF
Example 2: Determine NF
Product_ID Description
The relation is at least in 1NF.
There is a COMPOSITE Primary Key (PK) (Order_No, Product_ID), therefore there can be partial dependencies. Product_ID, which is a part of PK,
determines Description; hence, there is a partial dependency. Therefore, the relation is not 2NF. No
sense to check for transitive dependencies! ORDER
Order_No Product_ID Description
Example 2: Determine NF
Product_ID Description
ORDER
We know that the relation is at least in 1NF, and it is not in 2 NF. Therefore, we conclude that the
relation is in 1 NF.
Example 2: Determine NF
Product_ID Description
In your solution you will write the following justification:
1) No M/V attributes, therefore at least 1NF 2) There is a partial dependency (Product_ID Description), therefore
not in 2NF
Conclusion: The relation is in 1NF ORDER
Order_No Product_ID Description
Example 3: Determine NF
Part_ID Description Part ID Price determined by the primaryComp_ID and No are not
PART
_
Part_ID, Comp_ID No
determined by the primary key; therefore, the relation is
NOT in 1 NF. No sense in looking at partial or transitive
dependencies.
Example 3: Determine NF
Product_ID Description Product ID Price
In your solution you will write the following justification: 1) There are M/V attributes; Product_ID Price
Part_ID, Comp_ID No
) ;
therefore, not 1NF Conclusion: The relation is
unnormalized.
PART
Part_ID Descr Price Comp_ID No
Bringing a Relation to 1NF
STUDENT
Stud_ID Name Course_ID Units
101 Lennon MSI 250 3.00 101 Lennon MSI 415 3.00 125 Jonson MSI 331 3.00
Bringing a Relation to 1NF
Option 1: Make a determinant of the repeating group (or a
multivalued attribute) a part of the primary key multivalued attribute) a part of the primary key.
STUDENT
Stud_ID Name Course_ID Units
Composite Primary Key 101 Lennon MSI 250 3.00 101 Lennon MSI 415 3.00 125 Jonson MSI 331 3.00
Bringing a Relation to 1NF
Option 2: Remove the entire repeating group from the relation. Create another relation which would contain all the attributes of the repeating group, plus the primary key from the first relation. In this new relation,
h i k f h i i l l i d h d i f h
the primary key from the original relation and the determinant of the repeating group will comprise a primary key.
STUDENT
Stud_ID Name Course_ID Units
101 Lennon MSI 250 3.00
101 Lennon MSI 415 3.00
Bringing a Relation to 1NF
STUDENT STUDENT_COURSE Stud_ID Name 101 Lennon 101 Lennon 125 JonsonStud_ID Course Units
101 MSI 250 3 101 MSI 415 3 125 MSI 331 3
Bringing a Relation to 2NF
STUDENT
Stud_ID Name Course_ID Units
Composite Primary Key
101 Lennon MSI 250 3.00 101 Lennon MSI 415 3.00 125 Jonson MSI 331 3.00
Bringing a Relation to 2NF
Goal: Remove Partial DependenciesPartial
STUDENT
Stud_ID Name Course_ID Units
Composite Primary Key Dependencies 101 Lennon MSI 250 3.00 101 Lennon MSI 415 3.00 125 Jonson MSI 331 3.00
Bringing a Relation to 2NF
Remove attributes that are dependent from the part but not the whole of the
primary key from the original relation. For each partial dependency, create a new relation with the corresponding part of the primary key from the original new relation, with the corresponding part of the primary key from the original as the primary key.
STUDENT
Stud_ID Name Course_ID Units
101 Lennon MSI 250 3.00 101 Lennon MSI 415 3.00 125 Jonson MSI 331 3.00
Bringing a Relation to 2NF
CUSTOMER
Stud_ID Name Course_ID Units
101 Lennon MSI 250 3.00 101 Lennon MSI 415 3.00 125 Jonson MSI 331 3.00 STUDENT_COURSE Stud_ID Course_ID 101 MSI 250 101 MSI 415 125 MSI 331 COURSE STUDENT Course_ID Units MSI 250 3.00 MSI 415 3.00 MSI 331 3.00 Stud_ID Name 101 Lennon 101 Lennon 125 Jonson
Bringing a Relation to 3NF
Goal: Get rid of transitive dependencies.EMPLOYEE
Emp ID F Name L Name Dept ID Dept Name
Transitive Dependency
Emp_ID F_Name L_Name Dept_ID Dept_Name
111 Mary Jones 1 Acct
Bringing a Relation to 3NF
Remove the attributes, which are dependent on a non-key
attributes from the original relation. For each transitive d d l i i h h k ib dependency, create a new relation with the non-key attributes which is a determinant in the transitive dependency as a primary key, and the dependent non-key attribute as a dependent.
EMPLOYEE
Emp_ID F_Name L_Name Dept_ID Dept_Name
111 Mary Jones 1 Acct
122 Sara Smith 2 Mktg
Bringing a Relation to 3NF
EMPLOYEE
EMPLOYEE
Emp_ID F_Name L_Name Dept_ID Dept_Name
111 Mary Jones 1 Acct 122 Sara Smith 2 Mktg EMPLOYEE
Emp_ID F_Name L_Name Dept_ID
111 Mary Jones 1 122 Sara Smith 2 DEPARTMENT Dept_ID Dept_Name 1 Acct 2 Mktg
Other Normal Forms
(from Appendix B)
Boyce-Codd NF
ll d d d k h d h
All determinants are candidate keys…there is no determinant that is
not a unique identifier
Usually, if a relation is in #NF it is in the BCNF, except when a part of
the primary key is determined by a non-key attribute.
4
thNF and 5
thNF – used primarily for theoretical purposes
Merging Relations
View Integration – Combining entities from multiple ER models into
common relations
Issues to watch out for when merging entities from different ER
models:
Synonyms – two or more attributes with different names but same meaning Homonyms – attributes with same name but different meanings
Transitive dependencies – even if relations are in 3NF prior to merging, they may not be after merging
Enterprise Keys – advice from some
experts
Primary keys that are unique in the whole database, not just
ithi i l l ti within a single relation
Corresponds with the concept of an object ID in object-oriented
In class exercise
See handoutBased on Hoffer, Prescott and Topi Modern Database Management, (c) Prentice Hall 2009