• No results found

SQL (3): Joins. CS 4750 Database Systems SELF JOINS

N/A
N/A
Protected

Academic year: 2022

Share "SQL (3): Joins. CS 4750 Database Systems SELF JOINS"

Copied!
50
0
0

Loading.... (view fulltext now)

Full text

(1)

SQL (3): Joins

CS 4750 – Database Systems

CARTESIAN PRODUCT, NATURAL JOIN, INNER JOINS, OUTER JOINS, SELF JOINS

1

[A. Silberschatz, H. F. Korth, S. Sudarshan, Database System Concepts, Ch.4.1]

[ https://www.dofactory.com/sql/join ]

(2)

Join

2

left

table

right table FULL JOIN

left table

right table LEFT JOIN

left table

right table RIGHT JOIN

left table

right table (INNER) JOIN

left table

right table NATURAL JOIN

(preserve all rows, uncommon, thus skip) (most common)

(preserve all rows in right table) (preserve all rows

in left table)

(3)

What we’ll Cover

3

(4)

Cartesian Product

4

(5)

Cartesian Product

5

(6)

Cartesian Product

Reference: Used/adapted from towardsdatascience 6

(7)

Cartesian Product

Sometimes known as

“cross product”

𝑅 𝑥 𝑆

Structured Query Language (SQL)

SELECT * FROM R, S { WHERE C }

7

The Cartesian product of two relations R and S is the set of pairs that can be formed by pairing each tuple of R with each tuple of S.

Resulting schema is the schema of R

followed by the schema of S. We can rename attributes to avoid ambiguity/duplicate

names

(8)

Cartesian Product

p1 colA colB

1 A aa

2 B bb

3 C cc

4 D dd

5 E ee

7 G gg

8 H hh

p2 p1 colX colY

9001 1 55 ABC

9002 3 77 PQR

9003 3 95 DEF

9004 7 62 GHI

9005 5 50 KLM

9006 2 42 STU

9007 7 83 XYZ

one(p1, colA, colB) two(p2, p1, colX, colY)

SELECT *

FROM one JOIN two;

for each row1 in one:

for each row2 in two:

output (row1.p1, row1.colA, row1.colB,

row2.p2, row2.p1, row2.colX, row2.colY) SELECT *

FROM one, two;

(9)

p1 colA colB p2 p1 colX colY

1 A aa 9001 1 55 ABC

1 A aa 9005 5 50 KLM

1 A aa 9004 7 62 GHI

1 A aa 9003 3 95 DEF

1 A aa 9007 7 83 XYZ

1 A aa 9002 3 77 PQR

1 A aa 9006 2 42 STU

2 B bb 9002 3 77 PQR

2 B bb 9006 2 42 STU

2 B bb 9001 1 55 ABC

2 B bb 9005 5 50 KLM

2 B bb 9004 7 62 GHI

2 B bb 9003 3 95 DEF

2 B bb 9007 7 83 XYZ

3 C cc 9003 3 95 DEF

3 C cc 9007 7 83 XYZ

... ...

Cartesian Product

p1 colA colB

1 A aa

2 B bb

3 C cc

4 D dd

5 E ee

7 G gg

8 H hh

p2 p1 colX colY

9001 1 55 ABC

9002 3 77 PQR

9003 3 95 DEF

9004 7 62 GHI

9005 5 50 KLM

9006 2 42 STU

9007 7 83 XYZ

one(p1, colA, colB)

two(p2, p1, colX, colY)

one × two

(10)

Cartesian Product:

Color Coded Example

10

A B C

a b c

d a f

c b d

D E F

x y z

p q r

A B C D E F

a b c x y z

a b c p q r

d a f x y z

d a f p q r

c b d x y z

c b d p q r

R

S

R natural join S (R × S)

(11)

Cartesian Product: Another example

(12)

Natural Join

JOIN ON COMMON NAMED ATTRIBUTE(S) . IF NO MATCH FOUND, MOVE ON

12

(13)

RECALL: Queries Involving Multiple Relations

SELECT computingID, name, TAHiring.year, hourly_rate, hours_worked FROM TAHiring, Payrate

WHERE TAHiring.year = Payrate.year

• Combine the tables based on common columns

computingID name year hours_worked

ht1y Humpty 4 20

dt2y Dumpty 3 20

md3y Mickey 4 15

mn4e Minnie 4 16

dh5h Duhhuh 3 10

year hourly_rate

4 12

3 10

TAHiring

Payrate

“Natural join”

computingID name year hourly_rate hours_worked

ht1y Humpty 4 12 20

dt2y Dumpty 3 10 20

md3y Mickey 4 12 15

mn4e Minnie 4 12 16

dh5h Duhhuh 3 10 10

(14)

Natural Join

Reference: Used/adapted from towardsdatascience 14

(15)

Natural Join

Structured Query Language (SQL)

SELECT * FROM R, S

WHERE R.B=S.B AND R.C=S.C

SELECT *

FROM R NATURAL JOIN S

15

The Natural Join of two relations R and S is a set of pairs of tuples, one from R and one from S, that agree on whatever

attributes are common to the schemas of R

and S.

(16)

NATURAL JOIN

left table

right table NATURAL JOIN

p1 colA colB

1 A aa

2 B bb

3 C cc

4 D dd

5 E ee

7 G gg

8 H hh

p2 p1 colX colY

9001 1 55 ABC

9002 3 77 PQR

9003 3 95 DEF

9004 7 62 GHI

9005 5 50 KLM

9006 2 42 STU

9007 7 83 XYZ

one(p1, colA, colB) two(p2, p1, colX, colY)

SELECT *

FROM one NATURAL JOIN two;

for each row1 in one:

for each row2 in two:

if (row1.p1 = row2.p1):

output (row1.p1, row1.colA, row1.colB, row2.p2, row2.colX, row2.colY)

Join on common named

attribute(s).

If no match

found, move on.

(17)

p1 colA colB p2 colX colY

1 A aa 9001 55 ABC

2 B bb 9006 42 STU

3 C cc 9002 77 PQR

3 C cc 9003 95 DEF

5 E ee 9005 50 KLM

7 G gg 9004 62 GHI

7 G gg 9007 83 XYZ

NATURAL JOIN

SELECT * FROM one

NATURAL JOIN two

p1 colA colB

1 A aa

2 B bb

3 C cc

4 D dd

5 E ee

7 G gg

8 H hh

p2 p1 colX colY

9001 1 55 ABC

9002 3 77 PQR

9003 3 95 DEF

9004 7 62 GHI

9005 5 50 KLM

9006 2 42 STU

9007 7 83 XYZ

one(p1, colA, colB) two(p2, p1, colX, colY)

left table

right

table

NATURAL JOIN

(18)

natural join

Natural Join Example

(19)

Natural Join: Example

19

A B C

a b c

d b c

b b f

c a d

B C D

b c d

b c e

a d b

A B C D

a b c d

a b c e

d b c d

d b c e

c a d b

R

S

R natural join S

The Natural Join eliminates replicated attributes

(e.g. columns “B” and “C” show up only once in the R ⋈ S table.)

(20)

(Inner) Join

JOIN ON SPECIFIED ATTRIBUTE(S) . IF NO MATCH FOUND, MOVE ON.

20

(21)

(INNER) JOIN

left table

right table (INNER) JOIN

p1 colA colB

1 A aa

2 B bb

3 C cc

4 D dd

5 E ee

7 G gg

8 H hh

p2 p1 colX colY

9001 1 55 ABC

9002 3 77 PQR

9003 3 95 DEF

9004 7 62 GHI

9005 5 50 KLM

9006 2 42 STU

9007 7 83 XYZ

one(p1, colA, colB) two(p2, p1, colX, colY)

SELECT *

FROM one JOIN two ON one.p1 = two.p1;

Join on specified attribute(s).

If no match found, move on.

for each row1 in one:

for each row2 in two:

if (row1.p1 = row2.p1):

output (row1.p1, row1.colA, row1.colB,

row2.p2, row2.p1, row2.colX, row2.colY)

(22)

p1 colA colB p2 p1 colX colY

1 A aa 9001 1 55 ABC

2 B bb 9006 2 42 STU

3 C cc 9002 3 77 PQR

3 C cc 9003 3 95 DEF

5 E ee 9005 5 50 KLM

7 G gg 9004 7 62 GHI

7 G gg 9007 7 83 XYZ

(INNER) JOIN

SELECT *

FROM one JOIN two ON one.p1 = two.p1

p1 colA colB

1 A aa

2 B bb

3 C cc

4 D dd

5 E ee

7 G gg

8 H hh

p2 p1 colX colY

9001 1 55 ABC

9002 3 77 PQR

9003 3 95 DEF

9004 7 62 GHI

9005 5 50 KLM

9006 2 42 STU

9007 7 83 XYZ

one(p1, colA, colB) two(p2, p1, colX, colY)

SELECT *

FROM one INNER JOIN two ON one.p1 = two.p1

left table

right

table

(INNER) JOIN

(23)

Natural

Left Outer Join

JOIN ON COMMON NAMED ATTRIBUTE(S) .

IF NO MATCH FOUND, PAD WITH NULL VALUES

23

(24)

NATURAL LEFT OUTER JOIN

p1 colA colB

1 A aa

2 B bb

3 C cc

4 D dd

5 E ee

7 G gg

8 H hh

p2 p1 colX colY

9001 1 55 ABC

9002 3 77 PQR

9003 3 95 DEF

9004 7 62 GHI

9005 5 50 KLM

9006 2 42 STU

9007 7 83 XYZ

one(p1, colA, colB) two(p2, p1, colX, colY)

SELECT *

FROM one NATURAL LEFT OUTTER JOIN two;

Join on common named

attribute(s).

If no match found, pad with

NULL values

Preserve all rows in left table

left table

right

table

LEFT JOIN

(25)

p1 colA colB p2 colX colY

1 A aa 9001 55 ABC

2 B bb 9006 42 STU

3 C cc 9002 77 PQR

3 C cc 9003 95 DEF

4 D dd NULL NULL NULL

5 E ee 9005 50 KLM

7 G gg 9004 62 GHI

7 G gg 9007 83 XYZ

8 H hh NULL NULL NULL

p1 colA colB

1 A aa

2 B bb

3 C cc

4 D dd

5 E ee

7 G gg

8 H hh

p2 p1 colX colY

9001 1 55 ABC

9002 3 77 PQR

9003 3 95 DEF

9004 7 62 GHI

9005 5 50 KLM

9006 2 42 STU

9007 7 83 XYZ

one(p1, colA, colB) two(p2, p1, colX, colY)

NATURAL LEFT OUTER JOIN

left table

right

table

LEFT JOIN

(26)

Left Outer Join

JOIN ON SPECIFIED ATTRIBUTE(S) .

IF NO MATCH FOUND, PAD WITH NULL VALUES

26

(27)

LEFT OUTER JOIN

p1 colA colB

1 A aa

2 B bb

3 C cc

4 D dd

5 E ee

7 G gg

8 H hh

p2 p1 colX colY

9001 1 55 ABC

9002 3 77 PQR

9003 3 95 DEF

9004 7 62 GHI

9005 5 50 KLM

9006 2 42 STU

9007 7 83 XYZ

one(p1, colA, colB) two(p2, p1, colX, colY)

SELECT *

FROM one LEFT OUTTER JOIN two ON one.p1 = two.p1;

Join on specified attribute(s).

If no match found, pad with

NULL values

left table

right table LEFT JOIN

Preserve all rows in

left table

(28)

p1 colA colB p2 p1 colX colY

1 A aa 9001 1 55 ABC

2 B bb 9006 2 42 STU

3 C cc 9002 3 77 PQR

3 C cc 9003 3 95 DEF

4 D dd NULL NULL NULL NULL

5 E ee 9005 5 50 KLM

7 G gg 9004 7 62 GHI

7 G gg 9007 7 83 XYZ

8 H hh NULL NULL NULL NULL

p1 colA colB

1 A aa

2 B bb

3 C cc

4 D dd

5 E ee

7 G gg

8 H hh

p2 p1 colX colY

9001 1 55 ABC

9002 3 77 PQR

9003 3 95 DEF

9004 7 62 GHI

9005 5 50 KLM

9006 2 42 STU

9007 7 83 XYZ

one(p1, colA, colB) two(p2, p1, colX, colY)

LEFT OUTER JOIN

left table

right

table

LEFT JOIN

(29)

Natural

Right Outer Join

JOIN ON COMMON NAMED ATTRIBUTE(S) .

IF NO MATCH FOUND, PAD WITH NULL VALUES

29

(30)

NATURAL RIGHT OUTTER JOIN

p1 colA colB

1 A aa

2 B bb

3 C cc

4 D dd

5 E ee

7 G gg

8 H hh

p2 p1 colX colY

9001 1 55 ABC

9002 3 77 PQR

9003 3 95 DEF

9004 7 62 GHI

9005 5 50 KLM

9006 2 42 STU

9007 7 83 XYZ

one(p1, colA, colB) two(p2, p1, colX, colY)

SELECT *

FROM one NATURAL RIGHT OUTTER JOIN two;

Join on common named

attribute(s).

If no match found, pad with

NULL values

left table

right table RIGHT JOIN

Preserve all rows

in right table

(31)

p1 colA colB p2 colX colY

1 A aa 9001 55 ABC

2 B bb 9006 42 STU

3 C cc 9002 77 PQR

3 C cc 9003 95 DEF

5 E ee 9005 50 KLM

7 G gg 9004 62 GHI

7 G gg 9007 83 XYZ

p1 colA colB

1 A aa

2 B bb

3 C cc

4 D dd

5 E ee

7 G gg

8 H hh

p2 p1 colX colY

9001 1 55 ABC

9002 3 77 PQR

9003 3 95 DEF

9004 7 62 GHI

9005 5 50 KLM

9006 2 42 STU

9007 7 83 XYZ

one(p1, colA, colB) two(p2, p1, colX, colY)

NATURAL RIGHT OUTTER JOIN

left table

right

table

RIGHT JOIN

(32)

Right Outer Join

JOIN ON SPECIFIED ATTRIBUTE(S) .

IF NO MATCH FOUND, PAD WITH NULL VALUES

32

(33)

RIGHT OUTTER JOIN

p1 colA colB

1 A aa

2 B bb

3 C cc

4 D dd

5 E ee

7 G gg

8 H hh

p2 p1 colX colY

9001 1 55 ABC

9002 3 77 PQR

9003 3 95 DEF

9004 7 62 GHI

9005 5 50 KLM

9006 2 42 STU

9007 7 83 XYZ

one(p1, colA, colB) two(p2, p1, colX, colY)

Join on specified attribute(s).

If no match found, pad with

NULL values

left table

right table RIGHT JOIN

SELECT *

FROM one RIGHT OUTTER JOIN two ON one.p1 = two.p1;

Preserve all rows

in right table

(34)

p1 colA colB

1 A aa

2 B bb

3 C cc

4 D dd

5 E ee

7 G gg

8 H hh

p2 p1 colX colY

9001 1 55 ABC

9002 3 77 PQR

9003 3 95 DEF

9004 7 62 GHI

9005 5 50 KLM

9006 2 42 STU

9007 7 83 XYZ

one(p1, colA, colB) two(p2, p1, colX, colY)

RIGHT OUTTER JOIN

p1 colA colB p2 p1 colX colY

1 A aa 9001 1 55 ABC

2 B bb 9006 2 42 STU

3 C cc 9002 3 77 PQR

3 C cc 9003 3 95 DEF

5 E ee 9005 5 50 KLM

7 G gg 9004 7 62 GHI

7 G gg 9007 7 83 XYZ

left table

right

table

RIGHT JOIN

(35)

Self Joins

JOINING A TABLE WITH ITSELF

35

(36)

Self Joins

36

(37)

(Challenging) Example: Self Joins

TA_id Name Year Two_week_hours

1234 Minnie 4 20

2345 Humpty 3 24

3456 Dumpty 4 30

3333 Minnie 3 12

5678 Mickey 2 16

TA_id Course

1234 Database Systems 2345 Database Systems 3456 Cloud Computing 1234 Software Analysis

5678 Web Programming Lang.

2345 Software Analysis

Hiring(TA_id, Name, Year, Two_week_hours) TA_Assignment(TA_id, Course)

(38)

( Challenging ) Example: Self Joins

TA_id Name Year Two_week_hours

1234 Minnie 4 20

2345 Humpty 3 24

3456 Dumpty 4 30

3333 Minnie 3 12

5678 Mickey 2 16

TA_id Course

1234 Database Systems 2345 Database Systems 3456 Cloud Computing 1234 Software Analysis

5678 Web Programming Lang.

2345 Software Analysis

Hiring(TA_id, Name, Year, Two_week_hours) TA_Assignment(TA_id, Course)

SELECT H.Name

FROM Hiring AS H, TA_Assignment AS T WHERE H.TA_id = T.TA_id AND

T.Course = "Database Systems" AND T.Course = "Software Analysis";

Will this work?

No. Returns empty set

(39)

(Challenging) Example: Self Joins

TA_id Name Year Two_week_hours

1234 Minnie 4 20

2345 Humpty 3 24

3456 Dumpty 4 30

3333 Minnie 3 12

5678 Mickey 2 16

TA_id Course

1234 Database Systems 2345 Database Systems 3456 Cloud Computing 1234 Software Analysis

5678 Web Programming Lang.

2345 Software Analysis

Hiring(TA_id, Name, Year, Two_week_hours) TA_Assignment(TA_id, Course)

SELECT H.Name

FROM Hiring AS H, TA_Assignment AS T1, TA_Assignment AS T2

WHERE H.TA_id = T1.TA_id AND H.TA_id = T2.TA_id AND

T1.Course = "Database Systems" AND T2.Course = "Software Analysis";

Will this work?

Name Minnie Humpty

(40)

TA_id Name Year Two_week_hours

1234 Minnie 4 20

2345 Humpty 3 24

3456 Dumpty 4 30

3333 Minnie 3 12

5678 Mickey 2 16

TA_id Course

1234 Database Systems 2345 Database Systems 3456 Cloud Computing 1234 Software Analysis

5678 Web Programming Lang.

2345 Software Analysis

Hiring(TA_id, Name, Year, Two_week_hours) TA_Assignment(TA_id, Course)

SELECT H.Name

FROM Hiring AS H, TA_Assignment AS T1, TA_Assignment AS T2

WHERE H.TA_id = T1.TA_id AND H.TA_id = T2.TA_id AND

T1.Course = "Database Systems" AND T2.Course = "Software Analysis";

Let’s consider each part of the query

All pairs of courses a TA is assigned

For each pair, check for courses

(Challenging) Example: Self Joins

(41)

Cross Table with itself:

Cartesian Product Example

➢Cross a table with itself, can rename one of the tables (or both)

➢Match where C 1 = C 2 and D 1 ≠ D 2

➢So, customer Ann ate >1 dish

41

Customer Dish Ann Chicken

Bob Fish

Ann Veal

EatLog

C

1

D

1

C

2

D

2

Ann Ch Ann Ch

Ann Ch Bob Fi

Ann Ch Ann Ve

Bob Fi Ann Ch

Bob Fi Bob Fi

Bob Fi Ann Ve

Ann Ve Ann Ch

Ann Ve Bob Fi

Ann Ve Ann Ve

EatLog x EatLog

(42)

42

(43)

Union and Intersect

NOTE: INTERSECT IS NOT SUPPORTED BY MYSQL (THERE ARE WORKAROUNDS)

43

(44)

RECALL: UNION

(SELECT S_id, Course FROM Student_lecture1) UNION (SELECT S_id, Course FROM Student_lecture2)

• Combine two tables

• Each tables must have the same number of columns

• The columns must have similar / compatible data types

• The columns in both tables must be in the same order

• Tuples of the second table comes after the first

Student_lecture1

S_id Address Course Teaching_assistant

1234 57 Hockanum Blvd Database Systems Minnie 2345 1400 E. Bellows Database Systems Humpty

S_id Address Course Teaching_assistant

3456 900 S. Detroit Cloud Computing Dumpty

Student_lecture2

S_id Course

1234 Database Systems 2345 Database Systems 3456 Cloud Computing

(45)

Set: Union

𝑅 ∪ 𝑆

Structured Query Language (SQL)

(SELECT * FROM R) UNION

(SELECT * FROM S)

45

The Union of two relations R and S is the set of tuples that are in R or in S or in both.

R and S must have identical sets of

attributes and the types of the attributes must be the same (domain)

Attributes of R and S must occur in the same order

This Photoby Unknown Author is licensed under CC BY-SA

(46)

Set: Union

Reference: Used/adapted from towardsdatascience 46

What question do you think this query is answering?

“What are the book IDs of the books that Charlie or Mike borrowed?”

Reminder of natural join

(47)

RECALL: INTERSECT

(SELECT S_id FROM Student_lecture1) INTERSECT (SELECT S_id FROM Student_lecture2)

• Combine two tables

• Each table must have the same number of columns

• The columns must have similar / compatible data types

• The columns in both tables must be in the same order

• Tuples of the second table comes after the first

Student_lecture1

S_id Address Course Teaching_assistant

1234 57 Hockanum Blvd Database Systems Minnie 2345 1400 E. Bellows Database Systems Humpty

S_id Address Course Teaching_assistant

3456 900 S. Detroit Cloud Computing Dumpty 1234 57 Hockanum Blvd Machine Learning Mickey

Student_lecture2

S_id 1234

(48)

Set: Intersection

𝑅 ∩ 𝑆

Structured Query Language (SQL)

(SELECT * FROM R) INTERSECT

(SELECT * FROM S)

48

The Intersection of two relations R and S is the set of tuples that are in both R and S.

R and S must have identical sets of attributes and the types of the attributes must be the same (domain)

Attributes of R and S must occur in the same order

This Photoby Unknown Author is licensed under CC BY-SA

Intersection is not supported by all versions of SQL, such as MySQL. So there usually are

workarounds (e.g. natural join)

(49)

Set: Intersection

Reference: Used/adapted from towardsdatascience 49

What question do you think this query is answering?

Member IDs of the members who have borrowed both

the books “Fences” and “Inheritance”.

(50)

Union, Intersection, Difference

50

Each tuple in R and each tuple in S are put into the

resulting relation

A tuple is in the

resulting relation if and only if it is in both R and S

Only tuples that are in R but not in S are put into the

resulting relation

References

Related documents

While the current research indicates that the short-term, initial cost of permeable pavement may, in some cases, be more expensive than impervious pavement, that same

• Notice: During merge phase, we used only (b(R)+b(S))/m (size of two buckets) memory blocks. – Although there are

Understand business rules Identify entities and their attributes Model relationships between entities Apply constraints. Source: Coronel

public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1);. private

ALTER TABLE Persons DROP CONSTRAINT chk_PersonId. Limit the value range for

 Table: primary keys of the two participating entity sets and any attributes on the relationship itself (may have dups here).  Primary key: the primary keys added from

(b) If the maximum potential difference between the cloud and earth is 1.0 GV calculate the maximum charge stored and the corresponding energy. Points to watch are

This article employs critical discourse analysis to analyze the representation of the “war on terror” in the political speeches of Presidents George Bush and Barack Obama in