• No results found

Introduction to Database Systems. Chapter 4 Normal Forms in the Relational Model. Chapter 4 Normal Forms

N/A
N/A
Protected

Academic year: 2021

Share "Introduction to Database Systems. Chapter 4 Normal Forms in the Relational Model. Chapter 4 Normal Forms"

Copied!
22
0
0

Loading.... (view fulltext now)

Full text

(1)

Introduction to Database Systems

Winter term 2013/2014

Melanie Herschel

[email protected] Université Paris Sud, LRI

1

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

Chapter 4

Normal Forms in the Relational Model

After completing this chapter, you should be able to

• work with

functional dependencies (FDs): define

them, detect them in DB schemas, decide

implication, determine keys.

• explain insert, update, and delete

anomalies,

• explain

BCNF, test a given relation for BCNF,

and transform a relation into BCNF

2 1. Introduction 2. ER-Modeling 3. Relational model(ing) 4. Relational algebra 5. SQL

Chapter 4

Normal Forms

• Functional Dependencies (FDs) • Anomalies, FD-based Normal

Forms

(2)

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

Introduction

4

Relational database design theory is mainly based on a class of

constraints called Functional Dependencies (FDs).

FDs are a generalization of keys.

This theory defines when a relation is in normal form (e.g., in Third Normal Form or 3NF) for a given set of FDs.

It is usually a sign of bad DB design if a schema contains relations that violate the normal form requirements.

There are exceptions and tradeoffs, however.

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

Introduction

5

If a normal form is violated, data is stored redundantly, and information

about different concepts is intermixed.

Redundant data storage

The studio address is stored multiple times (once for each movie the studio has produced). The table also mixes the concepts of Studios and Movies.

MID Title Year Studio StudioAddress 1 The Incredibles 2004 Disney Burbank, CA

2 Bambi 1942 Disney Burbank, CA

3 Avatar 2010 20th Century Century City, CA Movie

Introduction

Today, the Third Normal Form (3NF) is considered the standard relational normal form used in practice (and education).

Boyce-Codd Normal Form (BCNF) is a bit more restrictive, easier to define, and will be a better match for our intuition of good DB design.

In rare circumstances, a relation might not have an equivalent BCNF

form while preserving all its FDs (this is always guaranteed for 3NF).

In a nutshell, BNCF requires that all FDs are already enforced

(3)

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

Introduction

7

Normalization algorithms can construct good relation schemas from a

set of attributes and a set of FDs alone.

In practice, relations are derived from ER models and normalization is used as an additional check (for BCNF) only.

When an ER model is well designed, the resulting derived relational tables will automatically be in BCNF.

Awareness of normal forms can help to detect design errors already in the ER design phase.

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

First Normal Form

8 First normal form (1NF)

First Normal Form (1NF) requires that all table entries are

atomic (not lists, sets, records, relations)

The relational model is already defined this way. All further normal forms assume that tables are in 1NF.

A few modern DBMS allow table entries to be non-atomic (structured). Such systems are commonly referred to as NF2 DBMS (NF2 = NFNF = Non-First Normal Form).

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

First Normal Form

9 Note: A discussion of 1NF does not really belong here (1NF is not based on functional dependencies at all).

NF2 table

AID Name DOB Movies

1 Pitt 4.6.75

2 Jolie 18.12.63

Actor

Title Year Rating

Wanted 2008 3

Title Year Rating

Babel 2006 7

(4)

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

First Normal Form

10

It is not a violation of 1NF if

1. A table entry contains values of a domain that exhibit an internal structure (e.g., all Genres of a Movie are collected in a

VARCHAR(100) attribute and are separated by spaces),

2.Related information is represented by several columns for a single row (e.g., columns Genre1, Genre2, Genre3 to represent up to three genres a of a movie).

These are signs of bad design anyway.

Why?

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

Functional Dependencies

11

An instance of a functional dependency (FD) in our latest example

tables is

Studio ! StudioAddress

Whenever two rows of a relation agree in the the attribute Studio, they must also agree in the StudioAddress column values:

Redundant data storage

MID Title Year Studio StudioAddress 1 The Incredibles 2004 Disney Burbank, CA

2 Bambi 1942 Disney Burbank, CA

3 Avatar 2010 20th Century Century City, CA Movie

Functional Dependencies

Intuitively, the reason for the validity of Studio ! StudioAddress is that studio address only depends on the studio itself, not on other movie data.

This FD is read as

“Studio {functionally, uniquely} determines StudioAddress.”

Also: “Studio is a determinant for StudioAddress.”

(5)

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

Functional Dependencies

13

A key uniquely determines all attributes of its relation. In the example,

the FDs MID ! TITLE, MID ! Year, MID ! Studio, MID ! StudioAddress hold, too.

There will never be two distinct rows with the same MID, so the FD condition is trivially satisfied.

The FD Studio ! Title is not satisfied, for example. At least two rows agreeing on Studio disagree on Title:

Redundant data storage

Movie

MID Title Year Studio StudioAddress 1 The Incredibles 2004 Disney Burbank, CA

2 Bambi 1942 Disney Burbank, CA

3 Avatar 2010 20th Century Century City, CA

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

Functional Dependencies

14

In general, an FD takes the form

A1,...,An → B1,...,Bm

Sequence and multiplicity of attributes in an FD are unimportant, since both sides formally are sets of attributes:

{A1,...,An} → {B1,...,Bm}

In discussing FDs, the focus is on a single relation R. All Ai, Bj are

attributes of the same relation R.

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

Functional Dependencies

15 Functional dependency

The

functional dependency (FD)

A

1

, ..., A

n

→ B

1

, ..., B

m

holds for a relation R in a database state I if and only if for all

tuples t, u ∈ I(R):

(6)

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

Functional Dependencies

16

An FD with m attributes on the right-hand side (rhs)

A

1

, ..., A

n

→ B

1

, ..., B

m

is equivalent to the m FDs:

A

1

, ..., A

n

→ B

1

⋮ ⋮

A

1

, ..., A

n

→ B

m

Thus, in the following discussion it suffices to consider FDs with a single column name on the rhs.

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

Functional Dependencies are Constraints

17

For the DB design process, the only interesting FDs are those that hold

for all possible database states.

FDs are constraints (like keys).

Non-interesting FDs

In this example state, FD Title ! Year holds. But this is not true in general (movies with equal titles may appear in different years). It is a task of DB design to verify that a FD is not mere coincidence.

Movie

MID Title Year Studio StudioAddress 1 The Incredibles 2004 Disney Burbank, CA

2 Bambi 1942 Disney Burbank, CA

3 Avatar 2010 20th Century Century City, CA

FDs vs. Keys

FDs are a generalization of keys:

A1, ..., An is a key of relation R(A1, ..., An, B1, ..., Bm)

A1, ..., An → B1, ..., Bm holds

Given the FDs for a relation R, one can derive a key for R, i.e., as a set of attributes A1, ..., An that functionally determines the other attributes

(7)

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

Example

19

The following table holds information about “All-time-favorite movies”

and their respective actors. Identifying FDs

Several actors can star in a movie and we have one actor per row. Attribute No is used to identify the position of an actor in the movie credits.

Movies

Actor No Title Year MID

Tim Robbins 1 The Shawshank Redemption 1994 1 Morgan Freeman 2 The Shawshank Redemption 1994 1

Marlon Brando 1 The Godfather 1972 2

Al Pacino 2 The Godfather 1972 2

James Caan 3 The Godfather 1972 2

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

Example

20

The MID uniquely identifies a movie. Thus, the following FD holds

MID ! Title, Year Equivalently, we could use two FDs:

MID ! Title MID ! Year

Since a movie may have several actors, the following FD does not hold:

MID ! Actor

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

Example

21

One actor can star in many movies, thus the following FD may not be

assumed although it happens to hold in the given example DB state: Actor ! Title

It is possible that there are movies with the same title but with different actors or with different years. So Title determines no other attribute.

For every movie (we now know that movies are identified by MID) there can only be one first (second, third, ...) actor:

MID, No ! Actor

At first glance, the actor of any given movie is also uniquely assigned a position in the credit sequence.

MID, Actor ! Num

However, this is violated by credits that include a same actor twice (e.g., Tracey Ullman speaks both Neil van Dort and Hildegarde in Corpse Bride).

(8)

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

Example

22

We might also be tempted to assume the FD

Title, Year, No ! Actor

If, in a TV series the credit sequence changes over a season, we are in trouble.

During DB design, only unquestionable conditions should be used as FDs.

DB normalization alters the table structure depending on the specified

FDs.

If it turns out later that an FD indeed does not hold, it is not sufficient to simply remove a constraint (e.g., a key) from the schema with an ALTER TABLE SQL statement.

Instead, tables will be created or deleted, data will be copied, and application programs may have to be changed.

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

FD Quiz

23

FD-Quiz

1.Which FDs should hold for this table in general? 2.Identify an FD that holds in this state but not in general.

MovieShows

CinemaID CName CAddress Showtime Showroom Capacity MID

1 Ufa Stuttgart 14:00 A 50 1 1 Ufa Stuttgart 16:00 A 50 1 1 Ufa Stuttgart 14:00 B 100 2 2 Cinestar Berlin 20:00 X 200 1 2 Cinestar Berlin 20:00 Y 150 3 3 Korso Stuttgart 19:00 1 50 3

FD Quiz - Notes

(9)

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

Implication of FDs

25

Note: the FD MID ! StudioAddress is a consequence since we already

knew MID ! Studio and Studio ! StudioAddress. Whenever A → B and B →C hold, A → C is automatically satisfied.

Studio ! Studio trivially holds, but is not interesting.

FDs of the form A → A always hold (for every DB state). Implication of FDs

A set of FDs {α

1

→ β

1

,...,α

n

→ β

n

}

implies an FD α → β if and

only if every DB state which satisfies α

i

→ β

i

,1≤ i ≤ n, also

satisfies α → β.

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

Implication of FDs

26

The DB designer is normally not interested in all FDs which hold, but

only in a representative FD set that implies all other FDs. Armstrong Axioms

Reflexivity:

If β ⊆ α, then α → β.

Augmentation:

If α → β, then α ∪ γ → β ∪ γ.

Transitivity:

If α → β and β → γ, then α → γ.

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

Implication of FDs

27

However, a simpler way to check whether

α

→ β is implied by a given

FD set is to compute the cover α+ of α and then to check if β ⊆ α+.

Cover

The

cover α

+

of a set of attributes α is the set of all attributes B

that are uniquely determined by the attributes α (with respect to

a given FD set

F):

α

+

:= {B |

F implies α → B}

If necessary, write α

+F

to make the dependence on

F explicit.

(10)

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

Implication of FDs

28 Cover computation Input:! α (set of attributes) α1 ! β1,...,αn ! βn (set of FDs F) Output:! α+ (set of attributes) x ← α;

while x did change do

foreach given FD αi ! βi do if αi ⊆ x then x ← x ∪ βi ; fi od od return x;

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

Implication of FDs

29

Compute {MID}+

Consider the following FDs:

MID ! Title, Studio MID, No ! Actor

Studio ! StudioAddress 1. We start with x = {MID}.

2. The FD MID ! Title, Studio has a lhs that is completely contained in the current attribute set x.

3. Extend x by the FD’s rhs attribute set, giving x = {MID, Title, Studio}.

4. Now, the FD Studio ! StudioAddress becomes applicable. 5.Add rhs attribute set of the FD to the current attribute set x, giving

x = {MID, Title, Studio, StudioAddress}

6.The FD MID, No ! Actor is still not applicable (attribute No ∉ x). 7.No further way to extend set x, the algorithm returns

{MID}+ = {MID, Title, Studio, StudioAddress}

8.We may now conclude, e.g., MID ! StudioAddress

How to Determine Keys

Given a set of FDs and the set of all attributes A of a relation R, one can determine all possible keys of R:

α ⊆A is key of R α+ = A

Remember: Normally, we are interested in minimal keys only.

A superset of a key is a key again, e.g., if {Title, Year} is key for relation Movie then so is {Title, Year, Length}). We thus usually require that every

(11)

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

How to Determine Keys

31

1. Start to construct a key x iteratively with the empty attribute set (x = ∅).

2. In order to avoid non-minimal keys, make sure that the set x never contains the

rhs B of an FD α → B when x already contains the lhs, i.e., α ⊆ x.

Without loss of generality, we assume the rhs of all FDs to consist of a single attribute only.

3. As long as x does not yet determine all attributes of R (i.e., x+A), choose any of

the remaining attributes X ∈ A − x+.

4. Now the process splits (need to follow both alternatives): (a)Simply add X to x.

(b)For each FD α→X,add α to x.

The search for the key branches. We need to backtrack to such a branch if

We run into a dead-end (determine a non-minimal key), or

We have found a key and want to look for an alternative minimal key.

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

How to Determine Keys

32 Key Computation (call with x = ∅)

procedure key(x,A,F) foreach α ! B ∈ F do

if α ⊆ x and B ∈ x and B ∉ α then

return; /* x not minimal */ fi

od

if x+ = A then

print x; /*found a minimal key x*/ else X ← any element of A − x+; key(x ∪ {X}, A, F); foreach α!X ∈ F do key (x ∪ α, A, F); od fi

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

FD Quiz

33

Determine keys for a relation

Assume F = { MovieID, CreditPos → Actor, MovieID → title }

1. Does F imply MovieID, CreditPos → title? 2.Determine a key for relation MovieActors

MovieActors

MovieID Title CreditPos Actor

1 Star Wars 1 Mark Hamill

1 Star Wars 2 Harrison Ford 2 Star Trek 1 Patrick Stewart 2 Star Trek 2 Johnathan Frakes

(12)

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

FD Quiz - Notes

34

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

Determinants

35 Determinants (Non-trivial FDs)

The attribute set A

1

,...,A

n

is called a

determinant for attribute set

B

1

,...,B

m

if and only if

the FD A

1

,...,A

n

→ B

1

,...B

m

holds, and

the lhs is minimal, i.e., whenever any A

i

is removed then

A

1

,...,A

i−1

,A

i+1

,A

n

→ B

1

,...B

m

does not hold, and

the lhs and rhs are distinct, i.e., {A

1

,...,A

n

} ≠ {B

1

,...,B

m

}.

Chapter 4

Normal Forms

• Functional Dependencies (FDs) • Anomalies, FD-based Normal

(13)

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

Consequences of Bad DB Design

37

It is normally a severe sign of bad DB design if a table contains an FD

(encodes a partial function) that is not implied by a key (e.g., Studio →

StudioAddress).

Among other problems, this leads to the redundant storage of certain facts (here: studio addresses):

Redundant data storage

MID Title Year Studio StudioAddress 1 The Incredibles 2004 Disney Burbank, CA

2 Bambi 1942 Disney Burbank, CA

3 Avatar 2010 20th Century Century City, CA Movie

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

Consequences of Bad DB Design

38

If a schema permits redundant data, additional constraints need to be specified to ensure that the redundant copies indeed agree.

In the example, this constraint is precisely the FD Studio → StudioAddress.

General FDs, however, are not standard constraints of the relational model and thus unsupported by RDBMS.

The solution is to transform FDs into key constraints. This is what DB normalization tries to do.

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

Consequences of Bad DB Design

Update Anomalies

39

Update anomalies:

When a single mini-world fact needs to be changed (e.g., a change in the studio address for Disney), multiple tuples must be updated. This complicates application programs.

Redundant copies potentially get out of sync and it is impossible/hard to identify the correct information.

Application programs unexpectedly receive multiple answers when a single answer was expected:

SELECT INTO breaks due to unexpected result SELECT DISTINCT StudioAddress INTO :address

FROM MOVIE

(14)

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

Consequences of Bad DB Design

Insertion Anomalies

40

Insertion anomalies:

The address of a new studio cannot be inserted into the database until it is known what movies it will produce.

Since MID is a key, it cannot be set to NULL.

Such insertion anomalies arise when unrelated concepts (here: movies and

studios) are stored together in a single table.

Remember: a relational table should encode a single (partial) function.

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

Consequences of Bad DB Design

Deletion Anomalies

41

Deletion anomalies:

When the last movie of a studio is deleted, its address is lost. Even if MID could be set to NULL: it would be strange that all movies produced by a studio may be deleted but the last.

Again: such deletion anomalies arise when unrelated concepts (here: movies

and studios) are stored together in a single table.

Boyce-Codd Normal Form

Boyce-Codd Normal Form

A relation R is in Boyce-Codd Normal Form (BCNF) if and only if all its FDs

are already implied by its key constraints.

For relation R in BCNF and any FD A1,...,An → B1,...,Bm of R one of the

following statements hold:

1. the FD is trivial, i.e., {B1,...,Bm} ⊆ {A1,...,An}, or

2. the FD follows from a key, because {A1,...,An} or a subset of it is already a

(15)

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

Boyce-Codd Normal Form

43

BCNF example

The relation

MOVIE(MID, Title, Studio, StudioAddress) with FDs

MID ➝ Title, Studio, StudioAddress Studio ➝ StudioAddress

is not in BCNF because the FD Studio ➝ StudioAddress is not implied by a key:

Studio is not a key of the entire relation

The FD is not trivial.

However, without the attribute StudioAddress (and the second FD) the relation MOVIE(MID, Title, Studio) is in BCNF.

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

Boyce-Codd Normal Form

44

BCNF example

A movie set can be booked by at most one movie at a time.

MovieSet(MID, Title, Date, Time, Location) The relation thus satisfies the following FDs (plus implied ones):

MID ➝ Title, Date, Time, Location Date, Time, Location ➝ MID The keys of MovieSet are:

MID

Date, Time, Location

Both FDs are implied by keys (in this case, their lhs even coincide with the keys), thus MovieSet is in BCNF.

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

Boyce-Codd Normal Form

45

BCNF example

Consider the relation

ACTOR(AID, Name, Fortune) and the following FDs:

AID ➝ Name, AID ➝ Fortune Name, Fortune ➝ Name

AID, Name ➝ Fortune This relation is in BCNF:

The two first FDs indicate that AID is a key. Both FDs are thus implied by a key.

The second FD is trivial (and may be ignored).

(16)

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

BCNF Quiz

46

BCNF Quiz

1. Is Movie(MID, Title, CreditPos, Actor) with the following FDs in BCNF?

MID, CreditPos ➝ Actor MID ➝ Title

2. Is Invoice(Inv_No, Date, Amount, Cust_No, Cust_Name) with the following FDs in BCNF?

Inv_No ➝ Date, Amount, Cust_No Inv_No, Date ➝ Cust_Name

Cust_No ➝ Cust_Name Date, Amount ➝ Date

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

BCNF Quiz - Notes

47

Third Normal Form

Third Normal Form (3NF) is slightly weaker than BCNF: if a relation is in BCNF, it is automatically in 3NF.

The differences are small. For most practical applications, the two can be considered as equivalent.

BCNF is a clear concept: we want no non-trivial FDs except those which are implied by a key constraint.

(17)

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

Third Normal Form

49 Third Normal Form

A relation R is in Third Normal Form (3NF) if and only if every FD A1,...,An → B (assume that FDs with multiple attributes on rhs have been expanded) satisfies at least one of the following conditions:

(1)The FD is trivial, i.e., B ∈ {A1,...,An},or

(2)The FD follows from a key, because {A1,...,An} or some subset of it is already a key of R, or

(3)B is a key attribute, i.e., element of a candidate key of R.

•Intuitively: “Each non-key attribute must provide a fact about the key, the whole key, and nothing but the key”.

•The only difference to BCNF is the additional clause (3).

•An attribute is called a non-key attribute (of R), if it does not appear in any minimal key of R. The minimality requirement is important here. Otherwise all attributes of R would qualify as key attributes. All attributes that are not non-key attributes are key attributes.

•3NF means that every determinant of a non-key attribute is a key (see clauses (2), (3)).

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

Third Normal Form

50

3NF example

Given the relation

Shows(Title, Cinema, City) assume the following FDs:

Cinema ➝ City Title, City ➝ Cinema

(assumes a movie is not shown twice in the same city) The keys of Shows are:

Title, City

Cinema, Title

The first FD violates BCNF, because Cinema is not a super-key.

However, Shows is in 3NF because City is a key-attribute (part of the first key).

Note: whenever keys do not overlap and 3NF is satisfied, then the relation automatically also satisfies BCNF.

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

Second Normal Form

51

The Second Normal Form (2NF) is only interesting for historical reasons.

If a relation is in 3NF or BCNF, it is automatically in 2NF.

2NF is “too weak”, e.g., the Movie(MID, Title, Year, Studio, StudioAddress) example actually satisfies 2NF (although it exhibits anomalies).

Second Normal Form

A relation is in Second Normal Form (2NF) if and only if every non-key attribute depends on the complete key.

(18)

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

Second Normal Form

52

A relation is in 2NF if and only if the given FDs do not imply an FD A1,...,An

→ B such that

1. A1,...,An is a strict subset of a minimal key, and

2. B is not contained in any minimal key (i.e., is a non-key attribute).

2NF counter example

The MovieActors table is not

in 2NF: Title is already uniquely

determined by MovieID, i.e., a part of the key.

MovieActors

MovieID Title CreditPos Actor

1 Star Wars 1 Mark Hamill

1 Star Wars 2 Harrison Ford 2 Star Trek 1 Patrick Stewart 2 Star Trek 2 Johnathan Frakes

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

Splitting Relations

53

If a table R is not in BCNF, we can split it into two tables (decomposition).

The violating FD determines how to split.

If the FD A1,...,An → B1,...Bmviolates BCNF, create a new relation

S(A1,...,An,B1,...,Bm) and remove B1,...,Bm from the original relation R.

Splitting “along an FD”

Remember: the FD

Studio → StudioAddress is the reason why table

Movie(MID, Title, Studio, StudioAddress) violates BCNF.

Split the table into

MovieList(MID, Title, Studio) AddressList (Studio, StudioAddress)

Splitting Relations

It is important that this splitting transformation is lossless, i.e., that the original relation can be reconstructed by means of a (natural) join:

Splitting “along an FD”

Movie = MovieList ⋈ AddressList

(19)

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

Splitting Relations

55 Decomposition Theorem

The split of relations is guaranteed to be lossless if the intersection of the attributes of the new tables is a key of at least one of them.

The join connects tuples depending on the attribute (values) in the intersection. If these values uniquely identify tuples in the other relation we do not lose information.

Original table Decomposition “Reconstruction”

(key A, B, C) R1 R2 R1 on R2 A B C a11 b11 c11 a11 b11 c12 a11 b12 c11 A B a11 b11 a11 b12 A C a11 c11 a11 c12 A B C a11 b11 c11 a11 b11 c12 a11 b12 c11 a11 b12 c12 “Lossy” decomposition

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

Splitting Relations

56

All splits initiated by the above method for transforming relations into BCNF

satisfy the condition of the decomposition theorem.

It is always possible to transform a relation into BCNF by lossless splitting (if

necessary, split repeatedly).

Lossless split condition satisfied

{MID, Title, Studio} ∩ {Studio, StudioAddress} = {Studio}

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

Splitting Relations

57

However, not every lossless split is reasonable:

Example

Splitting Movie into MovieTitle(MID, Title) and MovieYear(MID, Year) is lossless, but

the split is not necessary to enforce a normal form,

only requires costly joins in subsequent queries.

Full splitting down to binary tables on the physical DB level, however, leads to high-performance DBMS implementations.

MID Title Year

1 The Incredibles 2004

2 Bambi 1942

3 Avatar 2010

(20)

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

Splitting Relations

58

Losslessness means that the resulting schema (after splitting) can represent

all DB states which were possible before.

We can translate states from the old schema into the new schema and back. The new schema supports all queries supported by the old schema.

However, the new schema allows states which do not correspond to a state in the old schema (we may now store studios and studio addresses without

any affiliation to a movie).

The new schema is more general.

If studios without movies are possible in the real world, the decomposition removes a fault in the old schema (insertion and update anomalies).

If they are not,

-

a new constraint is needed that is not necessarily easier to enforce than the FD, but

-

at least the redundancy is avoided (update anomaly).

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

Splitting Relations

59

Finally, note that although computable columns (such as Age which is

derivable from Birthdate) lead to violations of BCNF, splitting the relation is not the right solution.

The split would yield a relation R(Birthday, Age) which would try to materialize the computable function.

The correct solution is to eliminate Age from the original table and to define a view which contains all columns plus the computed column Age (invoking

a SQL stored procedure).

Preservations of FDs

Besides losslessness, a property that a good decomposition of a relation should guarantee is the preservation of FDs.

The problem is that an FD can refer only to attributes of a single relation.

When you split a relation into two, there might be FDs that can no longer be

expressed (these FDs are not preserved).

FD gets lost during decomposition

Consider Addresses(Street, City, State, ZIP) with FDs

(21)

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

Preservation of FDs

61

Probably, most designers would not split the table (the table is in 3NF

but violates BCNF).

If there are many addresses with the same ZIP code, there will be significant redundancy. If you split, queries will involve more joins.

If this were a DB for a mailing company, a separate table of ZIP codes

might be of interest on its own. Then the split looks feasible.

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

3NF Synthesis Algorithm

62

The following algorithm (divided in 2 sub-algorithms on this and the next slide), namely the synthesis algorithm, produces a lossless decomposition

of a given relation R into 3NF relations that preserves the FDs.

Determine a minimal (canonical) set of FDs that is equivalent to the given FDs

(a) Replace every FD α → B1, . . . , Bm by the corresponding FDs α → Bi, 1 ≤ i ≤ m. Let

the result be F.

(b) Minimize the lhs of every FD in F.

For each FD A1,...,An → B and each i = 1,...,n compute {A1,...,Ai−1,Ai+1,...,An}+F. If the result contains B, replace F by

(F − {A1,...,An → B}) ∪ {A1,...,Ai−1,Ai+1,...,An → B}, and repeat. (c) Remove implied FDs.

For each FD α → B, compute the cover α+F′ where F′ = F −{α → B}. If the cover contains B (i.e., the FD is already implied by the FDs in F), continue with F ← F .

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

3NF Synthesis Algorithm

63 Synthesis Algorithm

(1) Compute a canonical (minimal) set of FDs F (see (a) - (c) on

previous slide).

(2) For each lhs α of an FD in F create a relation with attributes

A = α ∪ {B | α → B ∈ F }.

(3) If none of the relations constructed in step (2) contains a key of the original relation R, add one relation containing the attributes of a key of R.

(4) For any two relations R1,2 constructed in steps (2), (3) , if the schema

(22)

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

Summary

64

Tables should not contain FDs other than those implied by the keys

(i.e., all tables should be in BCNF).

Such violating FDs indicate the combination of pieces of information that should be stored separately (presence of an embedded function). This leads to redundancy.

A relation may be normalized into BCNF by splitting it.

Sometimes it may make sense to avoid a split (and thus to violate BCNF). The DB designer has to carefully resolve such scenarios, incorporating application or domain knowledge.

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

Summary

• Functional dependencies: detect them in DB schemas, decide

implication, determine keys.

• Insert, update, and delete anomalies.

• Normal forms based on FDs: BCNF, 3NF, 2NF

• Synthesis algorithm to produce schema in 3NF

65

BCNF 3NF

2NF 1NF

References

Related documents

Find the mean, variance, and standard deviation of the sampling distribution of the sample means.. 

To date, private equity firms have focused on different stages of a portfolio company’s evolution, entering at an early stage with venture capital funds, later for growth capital,

(authorAffiliation is not fully functionally dependent on the first key) 2) Student (rollNo, name, dept, sex, hostelName, roomNo,. admitYear) Keys: rollNo,

In addition, we see that choosing a string similarity metric based on its performance on the OAEI test sets leads to good relative performance on analogous ontology matching

(b) Disability or health benefits.--Any amounts AMOUNTS contributed to an account, any increase INCREASES in the value of the account and any qualified withdrawal WITHDRAWALS

Also, the sound, particularly for the undamped cases, right from the start includes frequencies much lower than the fundamental of the plucked string — a clear example of the

or direct use of them via the SynthBot software; the analysis of synthesizer parametric redundancy; establishing why some parameter settings are harder to find than others;

The significant accounting policies adopted in preparing the financial report of Equatorial Resources Limited (“Equatorial” or “Company”) and its consolidated