• No results found

Introduction to normalization. Introduction to normalization

N/A
N/A
Protected

Academic year: 2021

Share "Introduction to normalization. Introduction to normalization"

Copied!
28
0
0

Loading.... (view fulltext now)

Full text

(1)

Introduction to normalization

Lecture 4

Instructor Anna Sidorova

Introduction to normalization

Agenda

Presentation

Review of relational models, in class exersiseIntroduction to normalization

In-class exercisesDiscussion of HW2

(2)

Next class

Review for the midtermHW 2 is due

March 7 – midterm exam

Based on Hoffer, Prescott and Topi Modern Database Management, (c) Prentice Hall 2009

HW 2

Chapter 4, Problem 6 – develop a relational schema

Convert ERD for Ch. 2, Problem 20 (a part of your HW1) , ( p y )

into a relational schema (must be based on the correct solution)

Chapter 4, Problems 7 and 8 (we will discuss the relevant

material next class)

Handout – normalization exercises

Based on Hoffer, Prescott and Topi Modern Database Management, (c) Prentice Hall 2009

(3)

Review of relational data models

Review of relational data models

Different people have different

views of the database these

Figure 2-7 Three-schema architecture

database…these are the external

schema The internal schema is the d l i underlying design and implementation

(4)

Relation

Relation

Definition: A relation is a named, twoDefinition: A relation is a named, two--dimensional table of data dimensional table of data

Table consists of rows (records) and columns (attribute or field)Table consists of rows (records) and columns (attribute or field)

Requirements for a table to qualify as a relation:Requirements for a table to qualify as a relation:

 It must have a unique nameIt must have a unique name 

 Every attribute value must be atomic (not Every attribute value must be atomic (not multivaluedmultivalued, not composite), not composite) 

Every row must be unique (can’t have two rows with exactly the same values for all Every row must be unique (can’t have two rows with exactly the same values for all their fields)

their fields) 

Attributes (columns) in tables must have unique namesAttributes (columns) in tables must have unique names

 The order of the columns must be irrelevantThe order of the columns must be irrelevant 

 The order of the rows must be irrelevantThe order of the rows must be irrelevant

NOTE: all relations are in 1stNormal form

Translating ERD into relational schema

Map each entity into a relation

 Map each weak entity into a relation (include the identifier of the strong entity t f th i k )

as a part of the primary key)

Map each multivalued attribute into a relation (include the identifier of the

entity as a part of the primary key)

 Map many-to-many relationships and associative entities into a relation

Represent one-to-one and one-to-many relationships using foreign keys.

(5)

Normalization

Normalization

Learning Objectives

Define NormalizationDefine Normalization

Define 1st, 2ndand 3rdNormal Forms

(6)

Normalization: Definitions

Normalization is a method used to validate and improve a logical

design so that it satisfies certain constraints that avoid unnecessary duplication of data

The process of decomposing relations with anomalies to produce

smaller, well-structured relations

9.11 9.11

Well

Well--Structured Relations

Structured Relations

A relation that contains minimal data redundancy and allows users A relation that contains minimal data redundancy and allows users

to insert, delete, and update rows without causing data to insert, delete, and update rows without causing data to se t, e ete, a up ate ows w t out caus g ata to se t, e ete, a up ate ows w t out caus g ata inconsistencies

inconsistencies

Goal is to avoid anomaliesGoal is to avoid anomalies

Insertion AnomalyInsertion Anomaly––adding new rows forces user to create duplicate dataadding new rows forces user to create duplicate data 

Deletion AnomalyDeletion Anomaly––deleting rows may cause a loss of data that would be deleting rows may cause a loss of data that would be needed for other future rows

needed for other future rows 

Modification AnomalyModification Anomaly––changing data in a row forces changes to other changing data in a row forces changes to other rows because of duplication

rows because of duplication

12

(7)

Example

Example–

–Figure 5

Figure 5--2b

2b

13

Question–Is this a relation? Answer–Yes: Unique rows and no

multivalued attributes

Question–What’s the primary key? Answer–Composite: Emp_ID,

Course_Title

Anomalies in this Table

Anomalies in this Table

InsertionInsertion––can’t enter a new employee without having the employee take a can’t enter a new employee without having the employee take a

class class class class

DeletionDeletion––if we remove employee 140, we lose information about the if we remove employee 140, we lose information about the

existence of a Tax Acc class existence of a Tax Acc class

ModificationModification––giving a salary increase to employee 100 forces us to giving a salary increase to employee 100 forces us to

update multiple records update multiple records

Wh d h li i ?

Why do these anomalies exist?

Because there are two themes (entity types) in this one relation. This results in data duplication and an unnecessary

(8)

Normalization Process

The goal is to bring each relation into the Third Normal

Form.

The process bringing a relation into the 3rdNormal Form

Goes through stages.  1stNormal Form2ndNormal Form

3rdNormal Form

Functional Dependencies

Functional Dependency

A particular relationship between two attributes. For a given p p g

relation, attribute B is functionally dependent on attribute A if, for every valid value of A, that value of A uniquely determines the value of B

Instances (or sample data) in a relation do not prove the

existence of a functional dependency

Knowledge of problem domain is most reliable method for g p

identifying functional dependency

9.16 9.16

(9)

Functional Dependencies: Notations in

Problems

A B Attribute B is functionally dependent on attribute A (A determines B) A, B  C Attributes A and B together

determine attribute C

A B, C Both attributes, B and C are determined by (functionally determined by (functionally dependent on) attribute A

Functional Dependencies

We can draw functional dependencies between attributes of

l f ll a relation as follows:

STUDENT

Stud_ID F_Name L_Name E-mail

111 Mary Jones [email protected]

(10)

Important Definitions

Multivalued Attributes (repeating groups) – non-key

attributes or groups of non-key attributes the values of which are l d f d b d l d l

not uniquely identified by (directly or indirectly) (not

functionally dependent on) the value of the Primary Key (or its part).

STUDENT

Stud_ID Name Course_ID Units

STUDENT

Stud_ID Name Course_ID Units

101 Lennon MSI 250 3.00 101 Lennon MSI 415 3.00 125 Jonson MSI 331 3.00 101 Lennon MSI 250 3.00 MSI 415 3.00 125 Jonson MSI 331 3.00

Important Definitions

A relation is unnormalized (not in the 1stNormal Form) if

h l l d b

STUDENT

Stud ID Name Course ID Units it has multivalued attributes or repeating groups. Repeating

Group

Stud_ID Name Course_ID Units

101 Lennon MSI 250 3.00 101 Lennon MSI 415 3.00 125 Jonson MSI 331 3.00

(11)

Important Definitions

A relation is in the 1stNormal Form if it has no multivalued

b

STUDENT

Stud ID Name Course ID Units attributes or repeating groups.

Stud_ID Name Course_ID Units

101 Lennon MSI 250 3.00 101 Lennon MSI 415 3.00 125 Jonson MSI 331 3.00

Important Definitions

Partial Dependency – when an non-key attribute is

d d b b h h l f COMPOSITE

determined by a part, but not the whole, of a COMPOSITE primary key.

CUSTOMER

Cust ID Name Order ID

Partial Dependency

Cust_ID Name Order_ID

101 AT&T 1234

101 AT&T 156

(12)

Important Definitions

A relation is NOT in the 2ndNormal Form if it has partial

d d

dependencies.

CUSTOMER

Cust_ID Name Order_ID

Partial Dependency 101 AT&T 1234 101 AT&T 156 125 Cisco 1250

Important Definitions

A relation is in the 2ndNormal Form if it is in the 1stNormal

h l d d

Form AND has no partial dependencies.

EMPLOYEE

E ID F N L N D t ID D t N

Emp_ID F_Name L_Name Dept_ID Dept_Name

111 Mary Jones 1 Acct

(13)

Important Definitions

Transitive Dependency – when a non-key attribute

d h k b

determines another non-key attribute.

EMPLOYEE

E ID F N L N D t ID D t N

Transitive Dependency

Emp_ID F_Name L_Name Dept_ID Dept_Name

111 Mary Jones 1 Acct

122 Sara Smith 2 Mktg

Important Definitions

A relation is NOT in the 3rdNormal Form if it has

transitive dependencies.

EMPLOYEE

E ID F N L N D t ID D t N

Transitive Dependency

Emp_ID F_Name L_Name Dept_ID Dept_Name

111 Mary Jones 1 Acct

(14)

Important Definitions

A relation is in the 3rdNormal Form if it is in the 2nd

Normal Form and has no transitive dependencies.

EMPLOYEE

E ID F N L N D t ID

Emp_ID F_Name L_Name Dept_ID

111 Mary Jones 1

122 Sara Smith 2

Normal Forms: Review

Unnormalized –There are multivalued attributes or

repeating groups

1 NF – No multivalued attributes or repeating groups.2 NF – 1 NF plus no partial dependencies

3 NF – 2 NF plus no transitive dependencies

9.28 9.28

(15)

Example 1: Determine NF

ISBN Title

 ISBN  Publisher

All attributes are directly or indirectly determined  ISBN  Publisher

 Publisher  Address

BOOK

ISBN Title Publisher Address

by the primary key; therefore, the relation is

at least in 1 NF

ISBN Title Publisher Address

Example 1: Determine NF

ISBN Title

ISBN  Publisher

The relation is at least in 1NF. There is no COMPOSITEISBN  Publisher

 Publisher  Address

BOOK

ISBN Title Publisher Address

There is no COMPOSITE primary key, therefore there can’t be partial dependencies.

Therefore, the relation is at least in 2NF

(16)

Example 1: Determine NF

ISBN Title

 ISBN  Publisher

Publisher is a non-key attribute, and it determines Address,  ISBN  Publisher

 Publisher  Address

BOOK

ISBN Title Publisher Address

another non-key attribute. Therefore, there is a transitive dependency, which means that

the relation is NOT in 3 NF.

ISBN Title Publisher Address

Example 1: Determine NF

ISBN Title

ISBN  Publisher

We know that the relation is at least in 2NF, and it is not in 3 ISBN  Publisher

 Publisher  Address

BOOK

ISBN Title Publisher Address

NF. Therefore, we conclude that the relation is in 2NF.

(17)

Example 1: Determine NF

 ISBN Title

ISBN  P bl h

In your solution you will write the following justification:ISBN  Publisher  Publisher  Address following justification: 1) No M/V attributes, therefore at least 1NF 2) No partial dependencies, therefore at least 2NF 3) There is a transitive dependency

(Publisher  Address), therefore,

not 3NF

Conclusion: The relation is in 2NF BOOK

ISBN Title Publisher Address

Example 2: Determine NF

Product_ID  Description

ORDER

All attributes are directly or indirectly determined by the primary key; therefore, the relation

is at least in 1 NF

(18)

Example 2: Determine NF

Product_ID  Description

The relation is at least in 1NF.

There is a COMPOSITE Primary Key (PK) (Order_No, Product_ID), therefore there can be partial dependencies. Product_ID, which is a part of PK,

determines Description; hence, there is a partial dependency. Therefore, the relation is not 2NF. No

sense to check for transitive dependencies! ORDER

Order_No Product_ID Description

Example 2: Determine NF

Product_ID  Description

ORDER

We know that the relation is at least in 1NF, and it is not in 2 NF. Therefore, we conclude that the

relation is in 1 NF.

(19)

Example 2: Determine NF

Product_ID  Description

In your solution you will write the following justification:

1) No M/V attributes, therefore at least 1NF 2) There is a partial dependency (Product_ID  Description), therefore

not in 2NF

Conclusion: The relation is in 1NF ORDER

Order_No Product_ID Description

Example 3: Determine NF

 Part_ID  Description

 Part ID  Price determined by the primaryComp_ID and No are not

PART

_

Part_ID, Comp_ID  No

determined by the primary key; therefore, the relation is

NOT in 1 NF. No sense in looking at partial or transitive

dependencies.

(20)

Example 3: Determine NF

Product_ID  Description

 Product ID  Price

In your solution you will write the following justification: 1) There are M/V attributes;  Product_ID  Price

 Part_ID, Comp_ID  No

) ;

therefore, not 1NF Conclusion: The relation is

unnormalized.

PART

Part_ID Descr Price Comp_ID No

Bringing a Relation to 1NF

STUDENT

Stud_ID Name Course_ID Units

101 Lennon MSI 250 3.00 101 Lennon MSI 415 3.00 125 Jonson MSI 331 3.00

(21)

Bringing a Relation to 1NF

Option 1: Make a determinant of the repeating group (or a

multivalued attribute) a part of the primary key multivalued attribute) a part of the primary key.

STUDENT

Stud_ID Name Course_ID Units

Composite Primary Key 101 Lennon MSI 250 3.00 101 Lennon MSI 415 3.00 125 Jonson MSI 331 3.00

Bringing a Relation to 1NF

 Option 2: Remove the entire repeating group from the relation. Create another relation which would contain all the attributes of the repeating group, plus the primary key from the first relation. In this new relation,

h i k f h i i l l i d h d i f h

the primary key from the original relation and the determinant of the repeating group will comprise a primary key.

STUDENT

Stud_ID Name Course_ID Units

101 Lennon MSI 250 3.00

101 Lennon MSI 415 3.00

(22)

Bringing a Relation to 1NF

STUDENT STUDENT_COURSE Stud_ID Name 101 Lennon 101 Lennon 125 Jonson

Stud_ID Course Units

101 MSI 250 3 101 MSI 415 3 125 MSI 331 3

Bringing a Relation to 2NF

STUDENT

Stud_ID Name Course_ID Units

Composite Primary Key

101 Lennon MSI 250 3.00 101 Lennon MSI 415 3.00 125 Jonson MSI 331 3.00

(23)

Bringing a Relation to 2NF

Goal: Remove Partial Dependencies

Partial

STUDENT

Stud_ID Name Course_ID Units

Composite Primary Key Dependencies 101 Lennon MSI 250 3.00 101 Lennon MSI 415 3.00 125 Jonson MSI 331 3.00

Bringing a Relation to 2NF

Remove attributes that are dependent from the part but not the whole of the

primary key from the original relation. For each partial dependency, create a new relation with the corresponding part of the primary key from the original new relation, with the corresponding part of the primary key from the original as the primary key.

STUDENT

Stud_ID Name Course_ID Units

101 Lennon MSI 250 3.00 101 Lennon MSI 415 3.00 125 Jonson MSI 331 3.00

(24)

Bringing a Relation to 2NF

CUSTOMER

Stud_ID Name Course_ID Units

101 Lennon MSI 250 3.00 101 Lennon MSI 415 3.00 125 Jonson MSI 331 3.00 STUDENT_COURSE Stud_ID Course_ID 101 MSI 250 101 MSI 415 125 MSI 331 COURSE STUDENT Course_ID Units MSI 250 3.00 MSI 415 3.00 MSI 331 3.00 Stud_ID Name 101 Lennon 101 Lennon 125 Jonson

Bringing a Relation to 3NF

Goal: Get rid of transitive dependencies.

EMPLOYEE

Emp ID F Name L Name Dept ID Dept Name

Transitive Dependency

Emp_ID F_Name L_Name Dept_ID Dept_Name

111 Mary Jones 1 Acct

(25)

Bringing a Relation to 3NF

Remove the attributes, which are dependent on a non-key

attributes from the original relation. For each transitive d d l i i h h k ib dependency, create a new relation with the non-key attributes which is a determinant in the transitive dependency as a primary key, and the dependent non-key attribute as a dependent.

EMPLOYEE

Emp_ID F_Name L_Name Dept_ID Dept_Name

111 Mary Jones 1 Acct

122 Sara Smith 2 Mktg

Bringing a Relation to 3NF

EMPLOYEE

EMPLOYEE

Emp_ID F_Name L_Name Dept_ID Dept_Name

111 Mary Jones 1 Acct 122 Sara Smith 2 Mktg EMPLOYEE

Emp_ID F_Name L_Name Dept_ID

111 Mary Jones 1 122 Sara Smith 2 DEPARTMENT Dept_ID Dept_Name 1 Acct 2 Mktg

(26)

Other Normal Forms

(from Appendix B)

Boyce-Codd NF

ll d d d k h d h

All determinants are candidate keys…there is no determinant that is

not a unique identifier

Usually, if a relation is in #NF it is in the BCNF, except when a part of

the primary key is determined by a non-key attribute.

4

th

NF and 5

th

NF – used primarily for theoretical purposes

Merging Relations

View Integration – Combining entities from multiple ER models into

common relations

Issues to watch out for when merging entities from different ER

models:

 Synonyms – two or more attributes with different names but same meaning  Homonyms – attributes with same name but different meanings

 Transitive dependencies – even if relations are in 3NF prior to merging, they may not be after merging

(27)

Enterprise Keys – advice from some

experts

Primary keys that are unique in the whole database, not just

ithi i l l ti within a single relation

Corresponds with the concept of an object ID in object-oriented

(28)

In class exercise

See handout

Based on Hoffer, Prescott and Topi Modern Database Management, (c) Prentice Hall 2009

References

Related documents

The air temperature values in this study varied from the lowest value (12 C) which was recorded at station 1 during December, 2013 to the highest value (42 C) recorded during

Thorax : Pronotum as wide as long (ratio between width and length = 1.0 in medium), sides somewhat rounded between the transversal constrictions; glabrous, with some white

Very recently, in [24], de- cidability of satisfiability and entailment was obtained for a fragment of symbolic- heap separation logic with (“linearly compositional”)

Setting up your first basic store: This chapter covers creating a first basic store with Shopify and introduces the admin, where you’ll develop and run your store8. The key

Facet joint arthropathy - osteophyte formation and distortion of joint alignment MRI Axial T2 L3-L4 disk Psoas Paraspinal muscles Psoas Paraspinal NP AF MRI Axial T2 PACS, BIDMC

Specifications where the dependent variable is time for child rearing and time for housework include: individual fixed effects, family type fixed effects, year fixed effects,

Your membership dollars fund alumni programs and student scholarships that help ensure the success, honor the tradi- tions and enrich the experience that is Boise State

Respecto al total del período estudiado la presencia de Redes Sociales como parte intrínseca de las campañas analizadas es de un 63% y el grado máximo de participación del