DBMS
Database Normalization
DBMS
Database normalization is the process of removing redundant data from your tables in to improve storage efficiency data integrity (accuracy and consistency) and efficiency, data integrity (accuracy and consistency), and scalability ( accommodates changes).
In the relational model, methods exist for quantifying how efficient a database is. These classifications are called normal forms (or NF), and there are algorithms for ( ), g
converting a given database between them.
Normalization generally involves splitting existing tables Normalization generally involves splitting existing tables
into multiple ones, which must be re-joined or linked each time a query is issued.
History
DBMS
Edgar F. Codd first proposed the process of
normalization and what came to be known as the 1st normalization and what came to be known as the 1st normal form in his paper A Relational Model of Data for Large Shared Data Banks Codd stated:
“There is, in fact, a very simple elimination procedure which we shall call normalization. Through g
decomposition nonsimple domains are replaced by ‘domains whose elements are atomic
(nondecomposable) values.’”
Normal Form
DBMS
Edgar F Codd originally established three normal forms: Edgar F. Codd originally established three normal forms:
1NF, 2NF and 3NF. There are now others that are
ll d b 3NF i id l id d b
generally accepted, but 3NF is widely considered to be sufficient for most applications. Most tables when
reaching 3NF are also in BCNF (Boyce-Codd Normal Form).
Update Anomalies
DBMS
Relations that have redundant data may have
problems called update anomalies, which are
p
p
classified as ,
Insertion anomalies
D l ti
li
Deletion anomalies
Modification anomalies
Unnormalized Table
Insertion anomalies
DBMS
To insert details of a new branch that currently has no To insert details of a new branch that currently has no
members of staff into the StaffBranch table, it’s
ll i h ff l d l
necessary to enter nulls into the staff-related columns, such as staffNo.
However, as staffNo is the primary key for the
StaffBranch table, attempting to enter nulls for staffNo, p g violates entity integrity, and is not allowed.
Deletion anomalies
DBMS
If we delete a record from the StaffBranch table that If we delete a record from the StaffBranch table that
represents the last member of staff located at a branch,
h d il b h b h l l f h
the details about that branch are also lost from the database.
For example, if we delete the record for staff Art Peters (S0415) from the StaffBranch table, the details relating to
( ) , g
Update anomalies
DBMS
If we want to change the value of one of the columns of If we want to change the value of one of the columns of
a particular branch in the StaffBranch table, for example
h l h b f b h B001 d
the telephone number for branch B001, we must update the records of all staff located at that branch.
If this modification is not carried out on all the appropriate records of the StaffBranch table, the pp p , database will become inconsistent.
The Process of Normalization
DBMS
• Normalization is often executed as a series of steps. Each p step corresponds to a specific normal form that has known properties.
• As normalization proceeds, the relations become
progressively more restricted in format, and also less
l bl t d t li
vulnerable to update anomalies.
• For the relational data model, it is important to recognize , p g that it is only first normal form (1NF) that is critical in
creating relations. All the subsequent normal forms are optional
First Normal Form (1NF)
DBMS
Unnormalized form (UNF) Unnormalized form (UNF)
A table that contains one or more repeating groups.
ClientNo cName propertyNo pAddress rentStart rentFinish rent ownerNo oName ClientNo cName propertyNo pAddress rentStart rentFinish rent ownerNo oName
CR76 John kay PG4 PG16 6 lawrence St,Glasgow 5 Novar Dr
1-Jul-00 31-Aug-01 350 CO40
Tina Murphy
Tony PG16 5 Novar Dr,
Glasgow 1-Sep-02 1-Sep-02 450 CO93 Shaw
PG4 6 lawrence
St,Glasgow 1-Sep-99 10-Jun-00 350 CO40
Tina Murphy CR56 Aline Stewart PG36 PG16 2 Manor Rd, Glasgow 5 Novar Dr, 10-Oct-00 1-Nov-02 1-Dec-01 1-Aug-03 370 450 CO93 CO93 Tony Shaw Tony Shaw PG16
Glasgow 1-Nov-02 1-Aug-03 450 CO93
Definition of 1NF
DBMS
First Normal Form is a relation in which the intersection of each row and column contains one and only one value
each row and column contains one and only one value.
There are two approaches to removing repeating groups from unnormalized tables:
1 Removes the repeating groups by entering appropriate 1. Removes the repeating groups by entering appropriate
data in the empty columns of rows containing the repeating data.
2. Removes the repeating group by placing the repeating data, along with a copy of the original key attribute(s), in a separate relation. A primary key is identified for the new relation.
First Normal Form
DBMS
Steps to Remove Repeating Groups
ªR th ti l f th i i l
ªRemove the repeating columns from the original
unnormalized table.
ªCreate a new table with the primary key of the
base table and the repeating columns.
ªAdd another appropriate column to the primary
key, which ensures uniqueness.
ªCreate a foreign key in the new table to link back
1NF ClientRental relation with the first
approach
DBMS
pp
The ClientRental relation is defined as follows,
ClientRental ( clientNo, propertyNo, cName, pAddress, rentStart, ( , p p y , , p , , rentFinish, rent, ownerNo, oName)
ClientNo propertyNo cName pAddress rentStart rentFinish rent ownerNo oName CR76 PG4 John
Kay
6 lawrence
St,Glasgow 1-Jul-00 31-Aug-01 350 CO40
Tina Murphy
CR76 PG16 John Kay
5 Novar Dr,
Glasgow 1-Sep-02 1-Sep-02 450 CO93
Tony Shaw
CR56 PG4 Aline Stewart
6 lawrence
St,Glasgow 1-Sep-99 10-Jun-00 350 CO40
Tina Murphy
CR56 PG36 Aline Stewart
2 Manor Rd,
Glasgow 10-Oct-00 1-Dec-01 370 CO93
Tony Shaw
CR56 PG16 Aline Stewart
5 Novar Dr,
Glasgow 1-Nov-02 1-Aug-03 450 CO93
Tony Shaw Stewart Glasgow
1NF ClientRental relation with the second
approach
DBMS
pp
Client (clientNo, cName)
PropertyRentalOwner (clientNo, propertyNo, pAddress, rentStart,
i i h )
rentFinish, rent, ownerNo, oName)
ClientNo cName ClientNo cName CR76 John Kay CR56 Aline Stewart
ClientNo propertyNo pAddress rentStart rentFinish rent ownerNo oName ClientNo propertyNo pAddress rentStart rentFinish rent ownerNo oName CR76 PG4 6 lawrence
St,Glasgow 1-Jul-00 31-Aug-01 350 CO40
Tina Murphy
CR76 PG16 5 Novar Dr,
Glasgow 1-Sep-02 1-Sep-02 450 CO93
Tony Shaw
Glasgow Shaw
CR56 PG4 6 lawrence
St,Glasgow 1-Sep-99 10-Jun-00 350 CO40
Tina Murphy
CR56 PG36 2 Manor Rd,
Glasgow 10-Oct-00 1-Dec-01 370 CO93
Tony Shaw
CR56 PG16 5 Novar Dr,
Glasgow 1-Nov-02 1-Aug-03 450 CO93
Tony Shaw
Second Normal Form
DBMS A table is in second normal form if the table is in the first
normal form and every non-primary key column is functionally dependent upon the entire primary key.
No non-primary key column can be functionally dependent on part of the primary key.
If A and B are 2 columns, B is fully functionally dependent on A.B is not dependent on any subset of A.
Second Normal Form
DBMS
A table in the first normal form will be in second normal form if any one of the following applies:
ª The primary key is composed of only one column ª No non-keyed columns exist in the table.
ª Every non-keyed attribute is dependent on all of the columns contained in the primary key.
Second Normal Form
DBMS Steps to Remove Partial Dependencies
ª Determine which non-key columns are not dependent upon the table’s entire primary key.
ª Remove those columns from the base table. Create a second table with those non-keyed columns and a copy of the columns from the primary key that they are dependent upon
primary key that they are dependent upon.
ª Create a foreign key from the original base table to the new table, linking to the new primary key.g p y y
2NF ClientRental relation
DBMS The ClientRental relation has the following partial
dependencies:
clientNo Æ cName (Partial dependency)
f t N Æ Add t N N (P ti l d d )
2NF ClientRental relation
DBMS
Client (clientNo, cName)
R t l ( li tN t N tSt t tFi i h)
Client Rental
Rental (clientNo, propertyNo, rentStart, rentFinish)
PropertyOwner (propertyNo, pAddress, rent, ownerNo, oName) ClientNo cName
CR76 John Kay CR56 Aline Stewart
Client
ClientNo propertyNo rentStart rentFinish CR76 PG4 1-Jul-00 31-Aug-01 CR76 PG16 1-Sep-02 1-Sep-02 Rental CR56 PG4 1-Sep-99 10-Jun-00 CR56 PG36 10-Oct-00 1-Dec-01 CR56 PG16 1-Nov-02 1-Aug-03 PropertyOwner
propertyNo pAddress rent ownerNo oName PG4 6 lawrence St,Glasgow 350 CO40 Tina Murphy PG16 5 Novar Dr, Glasgowg 450 CO93 Tony Shawy PG36 2 Manor Rd, Glasgow 370 CO93 Tony Shaw
Third Normal Form
DBMS A table is in third normal form if every non-keyed column isA table is in third normal form if every non keyed column is
directly dependent on the primary key, and not dependent on another non-keyed column.y
If the table is in second normal form and all of the “transitiveIf the table is in second normal form and all of the transitive dependencies” are removed, then every non-keyed column is said to be “dependent upon the key, the whole key, and nothingp p y y g but the key”.
Third Normal Form
DBMS Steps to Remove Transitive DependenciesSteps to Remove Transitive Dependencies
ª Determine which columns are dependent on another non-keyed column.
ª Remove those columns from the base table.
ª Create a second table with those columns and the non-key columns that they are dependent upon.
ª Create a foreign key in the original table linking to the primary key of the new table.
3NF ClientRental relation
DBMS The Transitive dependencies for the PropertyOwner relations is as follows:
PropertyOwner
3NF ClientRental relation
DBMS
The resulting 3NF relations have the forms:
Client (clientNo, cName)
Rental (clientNo, propertyNo, rentStart, rentFinish) Rental (clientNo, propertyNo, rentStart, rentFinish) PropertyOwner (propertyNo, pAddress, rent, ownerNo)
3NF ClientRental relation
DBMS ClientNo cName CR76 John Kay CR56 Ali St t ClientClientNo propertyNo rentStart rentFinish CR76 PG4 1-Jul-00 31-Aug-01 CR76 PG16 1-Sep-02 1-Sep-02
Rental
CR56 Aline Stewart CR76 PG16 1 Sep 02 1 Sep 02 CR56 PG4 1-Sep-99 10-Jun-00 CR56 PG36 10-Oct-00 1-Dec-01 CR56 PG16 1-Nov-02 1-Aug-03
propertyNo pAddress rent ownerNo
PropertyOwner
ownerNo oName
Owner
PG4 6 lawrence St,Glasgow 350 CO40 PG16 5 Novar Dr, Glasgow 450 CO93 PG36 2 Manor Rd, Glasgow 370 CO93
CO40 Tina Murphy CO93 Tony Shaw