Normal Forms &
Functional
Dependencies (II)
Lossless-join
Decomposition
Building database tables
➢A reasonable table, but has a lot of repetitive data
➢Where might a reasonable person break this table to reduce duplication? →
“Course Title” [Assume Course Title is unique]
Lossless-join Decomposition
➢Took the original table (r) and broken it on Course Title – what this means is that Course Title is the linkage between these two relations
➢Would say this is a good break as it results in a “lossless-join
decomposition”. We’ve broken a table and have not added or created data, if and only if:
➢π
(r)⨝ π
(r) → original table r➢Natural joining the two tables should yield the schema of r
Lossless-join Decomposition
➢What does this assume?
(In what case would it not end up being a lossless-join decomposition)
➢If ∩ = {empty set}
➢If gain/loose fields
➢If Course Title is not unique
• What if there was another Database Systems course taught by another school? If Course Title was not unique then the tables, when joined back together, might not work. Your record might be joined with the “other”
Database System course (or vice versa)
➢Therefore, the ∩ must be a SUPER KEY of either or (one of the two relations – not the original table)
A lose definition, not limited to “primary key”
(though the intersection COULD be a primary
Is this Lossless-join Decomposition?
➢If the table on the left was decomposed in this way, is it lossless-join?
➢Answer: NO!
A B C
1 2 3
4 5 6
7 2 8
A B
1 2
4 5
7 2
B C
2 3
5 6
2 8
decomposed
A B C
1 2 3
1 2 8
4 5 6
7 2 3
7 2 8
natural join
Is this Lossless-join Decomposition?
➢If the table on the left was decomposed in this way, is it lossless-join?
➢Answer: Yes! ☺ (AB ∩ BC) → BC
A B C
1 2 3
4 5 6
7 2 3
A B
1 2
4 5
7 2
B C
2 3
5 6
decomposed
A B C
1 2 3
4 5 6
7 2 3
natural join
Functional Dependencies
Schema Refinement and Sound Structures
➢We discussed why it was important to follow “good” database design and avoid
“bad” database design
➢We want to avoid redundancy and avoid anomalies (→ inconsistent state)
➢Redundancy exists because of the existence of integrity constraints, in particular Functional Dependences (FD)
computingID name year hourly_rate hours_worked
ht1y Humpty 4 12 20
dt2y Dumpty 3 10 20
md3y Mickey 4 12 15
mn4e Minnie 4 12 16
dh5h Duhhuh 3 10 10
The hourly_rate of Humpty can be derived from the hourly_rate of Mickey (or Minnie) since they are all 4th year, and we know year determins hourly_rate
Remember the Update Anomaly?
Another example (1) :
➢What if we attempt to update hourly_rate of Humpty?
• Only one copy of hourly_rate has been updated!
➢Hourly_rate determined by year=4 appears in multiple tuples in the table
➢Cannot change hourly_rate in just the 1st tuple
computingID name year hourly_rate hours_worked
ht1y Humpty 4 12 20
dt2y Dumpty 3 10 20
md3y Mickey 4 12 15
mn4e Minnie 4 12 16
dh5h Duhhuh 3 10 10
13
Remember the Update Anomaly?
Another example (2) :
➢What if we want to insert a new employee, who is in 2nd year, but we don’t know the hourly rate for the 2nd year?
➢Cannot insert a new employee into the table
➢Also, if we delete all employees who are in the 3rd year, we lose information about the hourly_rate for 3rd year!
computingID name year hourly_rate hours_worked
ht1y Humpty 4 12 20
dt2y Dumpty 3 10 20
md3y Mickey 4 12 15
mn4e Minnie 4 12 16
dh5h Duhhuh 3 10 10
aw6e Awesome 2 ?? Null ? 10
Potential Solution
➢We can solve the redundancy and anomaly by converting this (original) table:
➢Into these tables:
computingID name year hourly_rate hours_worked
ht1y Humpty 4 12 20
dt2y Dumpty 3 10 20
md3y Mickey 4 12 15
mn4e Minnie 4 12 16
dh5h Duhhuh 3 10 10
computingID name year hours_worked
ht1y Humpty 4 20
dt2y Dumpty 3 20
md3y Mickey 4 15
mn4e Minnie 4 16
dh5h Duhhuh 3 10
year hourly_rate
4 12
3 10
General Design Guidelines
➢Semantics of attributes should be self-evident
➢Avoid redundancy – between tuples, relations
➢Avoid NULL values in tuples
➢If certain tuples should not exist, don’t allow them
Database design is about
characterizing data and the organizing
Data Interrelationships
➢Domain knowledge – things in the real world
➢Pattern analysis
➢We can:
• Associate computingID with name computingID → name
• Associate year with hourly_rate year → hourly_rate
computingID name year hourly_rate hours_worked
ht1y Humpty 4 12 20
dt2y Dumpty 3 10 20
md3y Mickey 4 12 15
mn4e Minnie 4 12 16
dh5h Duhhuh 3 10 10
ATA_Emp (computingID, name, year, hourly_rate, hours_worked)
Functional Dependencies
➢Assume “SID” (Student ID) was an attribute in the Student table. It would probably be the primary key of the student table. We could say that SID
uniquely defines Name
• Wherever we see SID we’ll always see the same Name associated with it (otherwise it would not be a primary key)
➢However, you can have that notion (one attribute implies, or is associated with, or uniquely defines another attribute) without dealing with primary keys
• E.g., In every time you see Course Title it will always indicate the same number of credits [Database Systems→ 3 credits]
• E.g., last name uniquely defines first name, and my office
[Basit→ Nada, Rice405] (assumption: what do you think?)
➢We call these Functional Dependencies
Functional Dependencies
➢Functional Dependencies – constraint on the set of legal relations. Require that the value for a certain set of attributes determines uniquely the value for another set of attributes; generalized notion of a key. Something can uniquely define another thing and NOT be a key!
➢Let R be a relation schema, A a subset of R, and B a subset of R.
The functional dependency A -> B holds on R if and only if for any legal
relations r(R), whenever any two tuples t1 and t2 of R agree on the attributes A, they also agree on the attributes B.
• That is, t1(A) = t2 (A) → t1(B) = t2 (B)
Functional Dependencies
➢Functional Dependencies – constraint on the set of legal relations. Require that the value for a certain set of attributes determines uniquely the value for another set of attributes; generalized notion of a key. Something can uniquely define another thing and NOT be a key!
➢Let R be a relation schema, A a subset of R, and B a subset of R.
The functional dependency A -> B holds on R if and only if for any legal
relations r(R), whenever any two tuples t1 and t2 of R agree on the attributes A, they also agree on the attributes B.
• That is, t1(A) = t2 (A) → t1(B) = t2 (B)
• Put another way: if you know the value(s) associated with A, you know the value(s) associated with B
• e.g., A=LName with value Basit, will always imply B=(FName with value Nada, and Office with value Rice405)
• [Basit → Nada, Rice405] (assumption: one prof. per office)– do you understand why?
Functional Dependencies:
Beware…!
➢FD holds over a relation R if, for every allowable instance r of R, r satisfies the FD
➢For a FD: A→B
➢Given some instance r of R, we can check if it violates some FD or not
• table1.A = table2.A but table1.B
≠
table2.B➢Note:
• You cannot tell if a FD holds over R by looking at an instance
• Cannot prove non-existence of violation
Functional Dependencies
➢A could be “Student (first) Name” and B could be “Student ID”
➢A→B does not hold (1 →4 and 1→5)
➢B→A does hold (all unique)
(can have multiple people with first name Bob, with different computing IDs)
➢Can have multiple functional dependencies on a table
• Not all of them necessarily meaningful
• e.g. Reflexive property: A→A, self implication, A always implies itself
1 4
1 5
2 6
computingID name year hourly_rate hours_worked
ht1y Humpty 4 13 20
dt2y Dumpty 3 10 20
md3y Mickey 4 12 15
mn4e Minnie 4 12 16
dh5h Duhhuh 3 10 10
Examples
computingID name year hourly_rate hours_worked
ht1y Humpty 4 12 20
dt2y Dumpty 3 10 20
md3y Mickey 4 12 15
mn4e Minnie 4 12 16
dh5h Duhhuh 3 10 10
Year →hourly_rate holds
Year → hourly_rate does not hold
Let’s Try: FDs
Which of the following is a functional dependency?
A B C
1 aa x
1 aa x
2 bb y
2 cc y
3 bb z
(a)
A → B
(b)B → C
(c)AB → C
(d)A → C
Possible
Not enough information
(cannot prove non-existence of violation)
Let’s Try: FDs
➢List possible FD(s):
EmpID → Name, xPhone, Position
EmpID Name xPhone Position
E1001 Mickey 6543 Clerk
E2353 Minnie 1234 Helpdesk
E4567 Daisy 9876 Salesrep
E1234 Donald 9876 Salesrep
E9372 Humpty 1234 Lawyer
More Example
➢Do all the following FDs hold on this instance?
ProductName Category Color Department Price
Beyblade Gadget Green Toys 12
Drone Gadget Green Toys 80
ProductName → Color Category → Department Color, Category → Price
If we say that R satisfies an FD, we are stating a
constrainton R
PossibleFunctional Dependencies
➢K is a super key of R iff:
• K is a subset of R, and K→R (K implies the entire relation)
• Mathematical definition of a super key
➢K is a candidate key of R iff:
• For all A, that is a subset of K, A !→R
• No subset of K can imply R. No extraneous attributes used, MINIMAL!
• Mathematical definition of a candidate key
• Example:
• If K=(P, Q, R) is a candidate key of R P,Q,R → relation R
• Then, A=(P,Q) one example of a subset of K P,Q !→ relation R
➢When you build a table, you can create a set of Functional Dependencies that will be a part of that table
Notation
➢Some notation for upcoming discussions:
➢Relation (table): R
➢Column names abbreviated to letters: A, B, C, D, E,…
• A=FirstName; B=LastName; C=Age; D=Major; …
➢Table with attributes: R(A, B, C, D) a relation with four (4) attributes
➢Functional dependencies (FDs): {A→B, B→C, C→CD}
(set of functional dependencies on the table)
Functional Dependencies
➢Goal: use FDs to determine how to break a table
➢We’re interested to know:
• Does a particular table have a primary key.
• Does it have a super key we can use?
➢To do that, we have to use a set of rules to come up with the full set of possible FDs, ones that we could derive from the set of FDs given
➢(E.g. What are all the things “A” can imply? What about “B”… and so forth)
Rules
➢Reflexivity: if b is a subset of a, then a → b
➢Augmentation: if a->b, then ac → bc
➢Transitivity: if a->b, and b -> c, then a → c
➢Union: if a-> b and a->c, then a → bc
➢Decomposition: if a->bc, then a → b and a → c (separately)
➢Pseudo-transitivity: if a->b and cb -> d, then ac → d
• E.g. Cid → LastName and FirstName,LastName→Office, then Cid,FirstName → Office
If nb3f -> Basit and Nada,Basit -> Rice405 then nb3f,Nada -> Rice405
➢Given a set of FDs you can come up with all of the other potential FDs by following this set of rules
Attribute Closure and F+
➢Attribute closure (α+): Finding all of the things that can be uniquely defined from a particular attribute, by using these rules (e.g. find all the FDs that have that particular attribute on the left-hand side)
➢Closure of F (F+): Finding attribute closures for ALL attributes, coming up with all possible FDs, is called closure of a table, or F+
➢F+: Given a set of FDs F, there are other FDs that are logically implied. This set is called F+ (closure of F). Found through application of these rules
➢These are going to give us the rules for the best way to break down a table (motivation for doing this)
Attribute Closure and F+
➢An FD f is logically implied by a set of FDs F if f holds whenever all FDs in F hold
➢To compute the closure of a set of functional dependencies F
F+ = F repeat
for each functional dependency f in F+
apply reflexivity and augmentation rules on f add the resulting functional dependencies to F+
for each pair of functional dependencies f1 and f2 in F+
if f1 and f2 can be combined using transitivity
then add the resulting functional dependency to F+
until F+ does not change any further
Example: Attribute Closure (work together)
➢Given R(A,B,C)
• FDs = {A → B, B → C }
➢Compute the attribute closures for all attribute and combination of attributes.
Then, think about what can inferred.
A B C AB AC BC ABC
A ✓ ✓ ✓ ✓ ✓ ✓ ✓
B C AB AC BC ABC
Summary: attribute closure A+ = ABC
B+ = C+ = AB+ = AC+ = BC+ = ABC+ =
A B C AB AC BC ABC
A ✓ ✓ ✓ ✓ ✓ ✓ ✓
B ✓ ✓ ✓
C ✓
AB ✓ ✓ ✓ ✓ ✓ ✓ ✓
AC ✓ ✓ ✓ ✓ ✓ ✓ ✓
BC ✓ ✓ ✓
ABC ✓ ✓ ✓ ✓ ✓ ✓ ✓
Summary: attribute closure A+ = ABC
B+ = BC C+ = C AB+ = ABC AC+ = ABC BC+ = BC ABC+ = ABC
Example: Attribute Closure (work together)
➢Given R(A,B,C)
• FDs = {A → B, B → C }
➢Compute the attribute closures for all attribute and combination of attributes.
Then, think about what can inferred.
Exercise 1: Computing F+
➢R(A,B,C,D,E)
➢FDs = {
A→C, B→B, C→BD, D→E
}➢Using the rules, compute F+:
➢(Is one of these a super key of the table?)
➢(1) Write all LHS: (2) Copy FDs as is: (3) Use rules:
A→ A→ C A→ABCDE
B→ B→ B B→ B
C→ C→ B D C→ BCDE
D→ D→ E D→ DE
E→ E→ E→ E
➢F+ = { A → ABCDE, B → B, C → BCDE, D → DE, E → E }
➢Which attribute to break on? More coming up…!
Remember the Reflexive rule
Exercise 2: Computing F+
➢R(A,B,C,D,E)
➢FDs = {
A → BC, B → D, CD → E
}➢Using the rules, compute F+:
(1) write all LHS
& remaining A →
B → C → D → E → CD →
(2) copy FDs as is
A → BC
B → D
C → D → E →
CD → E
(3) apply reflexivity
A → ABC B → B D C → C
D → D
E → E
CD → CDE
(4) apply transitivity
A → ABCD B → B D C → C
D → D
E → E
CD → CDE
Decompose: A → A, A → B, A → C
Exercise 2: Computing F+ (cont’d)
(4) apply transitivity
A → ABCD B → B D C → C
D → D
E → E
CD → CDE (from previous page)
(5) apply transitivity
A → ABCDE B → B D C → C
D → D
E → E
CD → CDE Decompose: A → A, A → B, A → C, A → D Union: A → C and A → D, then A → CD
F+ = { A → ABCDE, B → BD, C → C, D → D, E → E, CD → CDE }
One more Example!
➢R(A,B,C,D,E)
➢FDs = {A→B, AB→D, B→BDE, C→D, D→D}
➢Calculate F+:
One more Example! - Solutions
➢R(A,B,C,D,E)
➢FDs = {A→B, AB→D, B→BDE, C→D, D→D}
➢Calculate F+:
➢(1)LHSs (2) Copy FDs as is (3)Reflexive (4)Use rules
A→ B A→ AB DE
B→ B DE B→ B DE
AB→ D AB→ AB DE →E
C→ D C→ CD
D→ D D→ D
E→ E E→ E
Summary
➢To break a table down we need another notation other than a key structure
➢A key: set of attributes that can uniquely define a tuple
➢But there are other things that can uniquely define other parts of the tuple (e.g. {State, ZIP}→City)
➢Note: in SQL we can choose a set of attributes to be a Primary key, but other relationships cannot be denoted in SQL
➢These implications (x→y, where x uniquely defines y) are called functional dependencies (FD)
➢FD can be on a key or super key, but does not have to be
Summary
➢With the notation of FDs, and given a large table that we want to split apart, we have to define what makes a good split – how to know where to split the table
1. Must be a lossless-join decomposition
• If split a table into two, and you natural join them back together, you get the original table. Splitting the table results in some columns in one table, some columns in another table, and usually one or two columns that
“define” the other side (that are in both tables)
• Know it’s a lossless-join decomposition if: when you split the table, the intersection of the two sub-tables is a super key of one of the two tables (it does NOT have to be the super key of the entire (original) table) 2. No data is lost
3. No data is created
Summary
Other terms:
➢Attribute closure (α+): applying the rules for an attribute
➢F+: closure of all attributes of the table
Coming Up Next…
➢From 1NF, there are a number of “numbered”-normal forms that increase their level of strictness on a DB
➢The two we’ll talk about is 3NF (Third Normal Form) and BCNF (Boyce- Codd Normal Form)
• They are the primary two ways of defining good database form
• (2NF – outdated, not used as much)