SQL (3): Joins
CS 4750 – Database Systems
CARTESIAN PRODUCT, NATURAL JOIN, INNER JOINS, OUTER JOINS, SELF JOINS
1
[A. Silberschatz, H. F. Korth, S. Sudarshan, Database System Concepts, Ch.4.1]
[ https://www.dofactory.com/sql/join ]
Join
➢
➢
➢
2
left
table
right table FULL JOIN
left table
right table LEFT JOIN
left table
right table RIGHT JOIN
left table
right table (INNER) JOIN
left table
right table NATURAL JOIN
(preserve all rows, uncommon, thus skip) (most common)
(preserve all rows in right table) (preserve all rows
in left table)
What we’ll Cover
➢
➢
➢
➢
➢
➢
➢
➢
3
Cartesian Product
→
4
Cartesian Product
➢
➢
▪
▪
▪
▪
5
Cartesian Product
Reference: Used/adapted from towardsdatascience 6
Cartesian Product
Sometimes known as
“cross product”
𝑅 𝑥 𝑆
Structured Query Language (SQL)
SELECT * FROM R, S { WHERE C }
7
The Cartesian product of two relations R and S is the set of pairs that can be formed by pairing each tuple of R with each tuple of S.
Resulting schema is the schema of R
followed by the schema of S. We can rename attributes to avoid ambiguity/duplicate
names
Cartesian Product
p1 colA colB
1 A aa
2 B bb
3 C cc
4 D dd
5 E ee
7 G gg
8 H hh
p2 p1 colX colY
9001 1 55 ABC
9002 3 77 PQR
9003 3 95 DEF
9004 7 62 GHI
9005 5 50 KLM
9006 2 42 STU
9007 7 83 XYZ
one(p1, colA, colB) two(p2, p1, colX, colY)
SELECT *
FROM one JOIN two;
for each row1 in one:
for each row2 in two:
output (row1.p1, row1.colA, row1.colB,
row2.p2, row2.p1, row2.colX, row2.colY) SELECT *
FROM one, two;
p1 colA colB p2 p1 colX colY
1 A aa 9001 1 55 ABC
1 A aa 9005 5 50 KLM
1 A aa 9004 7 62 GHI
1 A aa 9003 3 95 DEF
1 A aa 9007 7 83 XYZ
1 A aa 9002 3 77 PQR
1 A aa 9006 2 42 STU
2 B bb 9002 3 77 PQR
2 B bb 9006 2 42 STU
2 B bb 9001 1 55 ABC
2 B bb 9005 5 50 KLM
2 B bb 9004 7 62 GHI
2 B bb 9003 3 95 DEF
2 B bb 9007 7 83 XYZ
3 C cc 9003 3 95 DEF
3 C cc 9007 7 83 XYZ
... … … ... … … …
Cartesian Product
p1 colA colB
1 A aa
2 B bb
3 C cc
4 D dd
5 E ee
7 G gg
8 H hh
p2 p1 colX colY
9001 1 55 ABC
9002 3 77 PQR
9003 3 95 DEF
9004 7 62 GHI
9005 5 50 KLM
9006 2 42 STU
9007 7 83 XYZ
one(p1, colA, colB)
two(p2, p1, colX, colY)
one × two
Cartesian Product:
Color Coded Example
10
A B C
a b c
d a f
c b d
D E F
x y z
p q r
A B C D E F
a b c x y z
a b c p q r
d a f x y z
d a f p q r
c b d x y z
c b d p q r
R
S
R natural join S (R × S)
Cartesian Product: Another example
Natural Join
JOIN ON COMMON NAMED ATTRIBUTE(S) . IF NO MATCH FOUND, MOVE ON
12
RECALL: Queries Involving Multiple Relations
SELECT computingID, name, TAHiring.year, hourly_rate, hours_worked FROM TAHiring, Payrate
WHERE TAHiring.year = Payrate.year
• Combine the tables based on common columns
computingID name year hours_worked
ht1y Humpty 4 20
dt2y Dumpty 3 20
md3y Mickey 4 15
mn4e Minnie 4 16
dh5h Duhhuh 3 10
year hourly_rate
4 12
3 10
TAHiring
Payrate
“Natural join”
computingID name year hourly_rate hours_worked
ht1y Humpty 4 12 20
dt2y Dumpty 3 10 20
md3y Mickey 4 12 15
mn4e Minnie 4 12 16
dh5h Duhhuh 3 10 10
Natural Join
Reference: Used/adapted from towardsdatascience 14
Natural Join
Structured Query Language (SQL)
SELECT * FROM R, S
WHERE R.B=S.B AND R.C=S.C
SELECT *
FROM R NATURAL JOIN S
15
The Natural Join of two relations R and S is a set of pairs of tuples, one from R and one from S, that agree on whatever
attributes are common to the schemas of R
and S.
NATURAL JOIN
left table
right table NATURAL JOIN
p1 colA colB
1 A aa
2 B bb
3 C cc
4 D dd
5 E ee
7 G gg
8 H hh
p2 p1 colX colY
9001 1 55 ABC
9002 3 77 PQR
9003 3 95 DEF
9004 7 62 GHI
9005 5 50 KLM
9006 2 42 STU
9007 7 83 XYZ
one(p1, colA, colB) two(p2, p1, colX, colY)
SELECT *
FROM one NATURAL JOIN two;
for each row1 in one:
for each row2 in two:
if (row1.p1 = row2.p1):
output (row1.p1, row1.colA, row1.colB, row2.p2, row2.colX, row2.colY)
Join on common named
attribute(s).
If no match
found, move on.
p1 colA colB p2 colX colY
1 A aa 9001 55 ABC
2 B bb 9006 42 STU
3 C cc 9002 77 PQR
3 C cc 9003 95 DEF
5 E ee 9005 50 KLM
7 G gg 9004 62 GHI
7 G gg 9007 83 XYZ
NATURAL JOIN
SELECT * FROM one
NATURAL JOIN two
p1 colA colB
1 A aa
2 B bb
3 C cc
4 D dd
5 E ee
7 G gg
8 H hh
p2 p1 colX colY
9001 1 55 ABC
9002 3 77 PQR
9003 3 95 DEF
9004 7 62 GHI
9005 5 50 KLM
9006 2 42 STU
9007 7 83 XYZ
one(p1, colA, colB) two(p2, p1, colX, colY)
left table
right
table
NATURAL JOIN
natural join
Natural Join Example
Natural Join: Example
19
A B C
a b c
d b c
b b f
c a d
B C D
b c d
b c e
a d b
A B C D
a b c d
a b c e
d b c d
d b c e
c a d b
R
S
R natural join S
The Natural Join eliminates replicated attributes
(e.g. columns “B” and “C” show up only once in the R ⋈ S table.)
(Inner) Join
JOIN ON SPECIFIED ATTRIBUTE(S) . IF NO MATCH FOUND, MOVE ON.
20
(INNER) JOIN
left table
right table (INNER) JOIN
p1 colA colB
1 A aa
2 B bb
3 C cc
4 D dd
5 E ee
7 G gg
8 H hh
p2 p1 colX colY
9001 1 55 ABC
9002 3 77 PQR
9003 3 95 DEF
9004 7 62 GHI
9005 5 50 KLM
9006 2 42 STU
9007 7 83 XYZ
one(p1, colA, colB) two(p2, p1, colX, colY)
SELECT *
FROM one JOIN two ON one.p1 = two.p1;
Join on specified attribute(s).
If no match found, move on.
for each row1 in one:
for each row2 in two:
if (row1.p1 = row2.p1):
output (row1.p1, row1.colA, row1.colB,
row2.p2, row2.p1, row2.colX, row2.colY)
p1 colA colB p2 p1 colX colY
1 A aa 9001 1 55 ABC
2 B bb 9006 2 42 STU
3 C cc 9002 3 77 PQR
3 C cc 9003 3 95 DEF
5 E ee 9005 5 50 KLM
7 G gg 9004 7 62 GHI
7 G gg 9007 7 83 XYZ
(INNER) JOIN
SELECT *
FROM one JOIN two ON one.p1 = two.p1
p1 colA colB
1 A aa
2 B bb
3 C cc
4 D dd
5 E ee
7 G gg
8 H hh
p2 p1 colX colY
9001 1 55 ABC
9002 3 77 PQR
9003 3 95 DEF
9004 7 62 GHI
9005 5 50 KLM
9006 2 42 STU
9007 7 83 XYZ
one(p1, colA, colB) two(p2, p1, colX, colY)
SELECT *
FROM one INNER JOIN two ON one.p1 = two.p1
left table
right
table
(INNER) JOIN
Natural
Left Outer Join
JOIN ON COMMON NAMED ATTRIBUTE(S) .
IF NO MATCH FOUND, PAD WITH NULL VALUES
23
NATURAL LEFT OUTER JOIN
p1 colA colB
1 A aa
2 B bb
3 C cc
4 D dd
5 E ee
7 G gg
8 H hh
p2 p1 colX colY
9001 1 55 ABC
9002 3 77 PQR
9003 3 95 DEF
9004 7 62 GHI
9005 5 50 KLM
9006 2 42 STU
9007 7 83 XYZ
one(p1, colA, colB) two(p2, p1, colX, colY)
SELECT *
FROM one NATURAL LEFT OUTTER JOIN two;
Join on common named
attribute(s).
If no match found, pad with
NULL values
Preserve all rows in left table
left table
right
table
LEFT JOIN
p1 colA colB p2 colX colY
1 A aa 9001 55 ABC
2 B bb 9006 42 STU
3 C cc 9002 77 PQR
3 C cc 9003 95 DEF
4 D dd NULL NULL NULL
5 E ee 9005 50 KLM
7 G gg 9004 62 GHI
7 G gg 9007 83 XYZ
8 H hh NULL NULL NULL
p1 colA colB
1 A aa
2 B bb
3 C cc
4 D dd
5 E ee
7 G gg
8 H hh
p2 p1 colX colY
9001 1 55 ABC
9002 3 77 PQR
9003 3 95 DEF
9004 7 62 GHI
9005 5 50 KLM
9006 2 42 STU
9007 7 83 XYZ
one(p1, colA, colB) two(p2, p1, colX, colY)
NATURAL LEFT OUTER JOIN
left table
right
table
LEFT JOIN
Left Outer Join
JOIN ON SPECIFIED ATTRIBUTE(S) .
IF NO MATCH FOUND, PAD WITH NULL VALUES
26
LEFT OUTER JOIN
p1 colA colB
1 A aa
2 B bb
3 C cc
4 D dd
5 E ee
7 G gg
8 H hh
p2 p1 colX colY
9001 1 55 ABC
9002 3 77 PQR
9003 3 95 DEF
9004 7 62 GHI
9005 5 50 KLM
9006 2 42 STU
9007 7 83 XYZ
one(p1, colA, colB) two(p2, p1, colX, colY)
SELECT *
FROM one LEFT OUTTER JOIN two ON one.p1 = two.p1;
Join on specified attribute(s).
If no match found, pad with
NULL values
left table
right table LEFT JOIN
Preserve all rows in
left table
p1 colA colB p2 p1 colX colY
1 A aa 9001 1 55 ABC
2 B bb 9006 2 42 STU
3 C cc 9002 3 77 PQR
3 C cc 9003 3 95 DEF
4 D dd NULL NULL NULL NULL
5 E ee 9005 5 50 KLM
7 G gg 9004 7 62 GHI
7 G gg 9007 7 83 XYZ
8 H hh NULL NULL NULL NULL
p1 colA colB
1 A aa
2 B bb
3 C cc
4 D dd
5 E ee
7 G gg
8 H hh
p2 p1 colX colY
9001 1 55 ABC
9002 3 77 PQR
9003 3 95 DEF
9004 7 62 GHI
9005 5 50 KLM
9006 2 42 STU
9007 7 83 XYZ
one(p1, colA, colB) two(p2, p1, colX, colY)
LEFT OUTER JOIN
left table
right
table
LEFT JOIN
Natural
Right Outer Join
JOIN ON COMMON NAMED ATTRIBUTE(S) .
IF NO MATCH FOUND, PAD WITH NULL VALUES
29
NATURAL RIGHT OUTTER JOIN
p1 colA colB
1 A aa
2 B bb
3 C cc
4 D dd
5 E ee
7 G gg
8 H hh
p2 p1 colX colY
9001 1 55 ABC
9002 3 77 PQR
9003 3 95 DEF
9004 7 62 GHI
9005 5 50 KLM
9006 2 42 STU
9007 7 83 XYZ
one(p1, colA, colB) two(p2, p1, colX, colY)
SELECT *
FROM one NATURAL RIGHT OUTTER JOIN two;
Join on common named
attribute(s).
If no match found, pad with
NULL values
left table
right table RIGHT JOIN
Preserve all rows
in right table
p1 colA colB p2 colX colY
1 A aa 9001 55 ABC
2 B bb 9006 42 STU
3 C cc 9002 77 PQR
3 C cc 9003 95 DEF
5 E ee 9005 50 KLM
7 G gg 9004 62 GHI
7 G gg 9007 83 XYZ
p1 colA colB
1 A aa
2 B bb
3 C cc
4 D dd
5 E ee
7 G gg
8 H hh
p2 p1 colX colY
9001 1 55 ABC
9002 3 77 PQR
9003 3 95 DEF
9004 7 62 GHI
9005 5 50 KLM
9006 2 42 STU
9007 7 83 XYZ
one(p1, colA, colB) two(p2, p1, colX, colY)
NATURAL RIGHT OUTTER JOIN
left table
right
table
RIGHT JOIN
Right Outer Join
JOIN ON SPECIFIED ATTRIBUTE(S) .
IF NO MATCH FOUND, PAD WITH NULL VALUES
32
RIGHT OUTTER JOIN
p1 colA colB
1 A aa
2 B bb
3 C cc
4 D dd
5 E ee
7 G gg
8 H hh
p2 p1 colX colY
9001 1 55 ABC
9002 3 77 PQR
9003 3 95 DEF
9004 7 62 GHI
9005 5 50 KLM
9006 2 42 STU
9007 7 83 XYZ
one(p1, colA, colB) two(p2, p1, colX, colY)
Join on specified attribute(s).
If no match found, pad with
NULL values
left table
right table RIGHT JOIN
SELECT *
FROM one RIGHT OUTTER JOIN two ON one.p1 = two.p1;
Preserve all rows
in right table
p1 colA colB
1 A aa
2 B bb
3 C cc
4 D dd
5 E ee
7 G gg
8 H hh
p2 p1 colX colY
9001 1 55 ABC
9002 3 77 PQR
9003 3 95 DEF
9004 7 62 GHI
9005 5 50 KLM
9006 2 42 STU
9007 7 83 XYZ
one(p1, colA, colB) two(p2, p1, colX, colY)
RIGHT OUTTER JOIN
p1 colA colB p2 p1 colX colY
1 A aa 9001 1 55 ABC
2 B bb 9006 2 42 STU
3 C cc 9002 3 77 PQR
3 C cc 9003 3 95 DEF
5 E ee 9005 5 50 KLM
7 G gg 9004 7 62 GHI
7 G gg 9007 7 83 XYZ
left table
right
table
RIGHT JOIN
Self Joins
JOINING A TABLE WITH ITSELF
35
Self Joins
➢
➢
➢
➢
36
(Challenging) Example: Self Joins
TA_id Name Year Two_week_hours
1234 Minnie 4 20
2345 Humpty 3 24
3456 Dumpty 4 30
3333 Minnie 3 12
5678 Mickey 2 16
TA_id Course
1234 Database Systems 2345 Database Systems 3456 Cloud Computing 1234 Software Analysis
5678 Web Programming Lang.
2345 Software Analysis
Hiring(TA_id, Name, Year, Two_week_hours) TA_Assignment(TA_id, Course)
( Challenging ) Example: Self Joins
TA_id Name Year Two_week_hours
1234 Minnie 4 20
2345 Humpty 3 24
3456 Dumpty 4 30
3333 Minnie 3 12
5678 Mickey 2 16
TA_id Course
1234 Database Systems 2345 Database Systems 3456 Cloud Computing 1234 Software Analysis
5678 Web Programming Lang.
2345 Software Analysis
Hiring(TA_id, Name, Year, Two_week_hours) TA_Assignment(TA_id, Course)
SELECT H.Name
FROM Hiring AS H, TA_Assignment AS T WHERE H.TA_id = T.TA_id AND
T.Course = "Database Systems" AND T.Course = "Software Analysis";
Will this work?
No. Returns empty set
(Challenging) Example: Self Joins
TA_id Name Year Two_week_hours
1234 Minnie 4 20
2345 Humpty 3 24
3456 Dumpty 4 30
3333 Minnie 3 12
5678 Mickey 2 16
TA_id Course
1234 Database Systems 2345 Database Systems 3456 Cloud Computing 1234 Software Analysis
5678 Web Programming Lang.
2345 Software Analysis
Hiring(TA_id, Name, Year, Two_week_hours) TA_Assignment(TA_id, Course)
SELECT H.Name
FROM Hiring AS H, TA_Assignment AS T1, TA_Assignment AS T2
WHERE H.TA_id = T1.TA_id AND H.TA_id = T2.TA_id AND
T1.Course = "Database Systems" AND T2.Course = "Software Analysis";
Will this work?
Name Minnie Humpty
TA_id Name Year Two_week_hours
1234 Minnie 4 20
2345 Humpty 3 24
3456 Dumpty 4 30
3333 Minnie 3 12
5678 Mickey 2 16
TA_id Course
1234 Database Systems 2345 Database Systems 3456 Cloud Computing 1234 Software Analysis
5678 Web Programming Lang.
2345 Software Analysis
Hiring(TA_id, Name, Year, Two_week_hours) TA_Assignment(TA_id, Course)
SELECT H.Name
FROM Hiring AS H, TA_Assignment AS T1, TA_Assignment AS T2
WHERE H.TA_id = T1.TA_id AND H.TA_id = T2.TA_id AND
T1.Course = "Database Systems" AND T2.Course = "Software Analysis";
Let’s consider each part of the query
All pairs of courses a TA is assigned
For each pair, check for courses
(Challenging) Example: Self Joins
Cross Table with itself:
Cartesian Product Example
➢
➢Cross a table with itself, can rename one of the tables (or both)
➢Match where C 1 = C 2 and D 1 ≠ D 2
➢So, customer Ann ate >1 dish
41
Customer Dish Ann Chicken
Bob Fish
Ann Veal
EatLog
C
1D
1C
2D
2Ann Ch Ann Ch
Ann Ch Bob Fi
Ann Ch Ann Ve
Bob Fi Ann Ch
Bob Fi Bob Fi
Bob Fi Ann Ve
Ann Ve Ann Ch
Ann Ve Bob Fi
Ann Ve Ann Ve
EatLog x EatLog
→
→
42
Union and Intersect
NOTE: INTERSECT IS NOT SUPPORTED BY MYSQL (THERE ARE WORKAROUNDS)
43
RECALL: UNION
(SELECT S_id, Course FROM Student_lecture1) UNION (SELECT S_id, Course FROM Student_lecture2)
• Combine two tables
• Each tables must have the same number of columns
• The columns must have similar / compatible data types
• The columns in both tables must be in the same order
• Tuples of the second table comes after the first
Student_lecture1
S_id Address Course Teaching_assistant
1234 57 Hockanum Blvd Database Systems Minnie 2345 1400 E. Bellows Database Systems Humpty
S_id Address Course Teaching_assistant
3456 900 S. Detroit Cloud Computing Dumpty
Student_lecture2
S_id Course
1234 Database Systems 2345 Database Systems 3456 Cloud Computing
Set: Union
𝑅 ∪ 𝑆
Structured Query Language (SQL)
(SELECT * FROM R) UNION
(SELECT * FROM S)
45
The Union of two relations R and S is the set of tuples that are in R or in S or in both.
R and S must have identical sets of
attributes and the types of the attributes must be the same (domain)
Attributes of R and S must occur in the same order
This Photoby Unknown Author is licensed under CC BY-SA
Set: Union
Reference: Used/adapted from towardsdatascience 46
What question do you think this query is answering?
“What are the book IDs of the books that Charlie or Mike borrowed?”
Reminder of natural join
RECALL: INTERSECT
(SELECT S_id FROM Student_lecture1) INTERSECT (SELECT S_id FROM Student_lecture2)
• Combine two tables
• Each table must have the same number of columns
• The columns must have similar / compatible data types
• The columns in both tables must be in the same order
• Tuples of the second table comes after the first
Student_lecture1
S_id Address Course Teaching_assistant
1234 57 Hockanum Blvd Database Systems Minnie 2345 1400 E. Bellows Database Systems Humpty
S_id Address Course Teaching_assistant
3456 900 S. Detroit Cloud Computing Dumpty 1234 57 Hockanum Blvd Machine Learning Mickey
Student_lecture2
S_id 1234
Set: Intersection
𝑅 ∩ 𝑆
Structured Query Language (SQL)
(SELECT * FROM R) INTERSECT
(SELECT * FROM S)
48
The Intersection of two relations R and S is the set of tuples that are in both R and S.
R and S must have identical sets of attributes and the types of the attributes must be the same (domain)
Attributes of R and S must occur in the same order
This Photoby Unknown Author is licensed under CC BY-SA
Intersection is not supported by all versions of SQL, such as MySQL. So there usually are
workarounds (e.g. natural join)
Set: Intersection
Reference: Used/adapted from towardsdatascience 49
What question do you think this query is answering?
Member IDs of the members who have borrowed both
the books “Fences” and “Inheritance”.
Union, Intersection, Difference
50