datalab Databases

40  Download (0)

Full text

(1)

Databases

08. C. Joins (ch 6.2)

2018. Fall Instructor: Joonho Kwon jhkwon@pusan.ac.kr Data Science Lab @ PNU

datalab

(2)

Joins: basics

Joins: SQL semantics

Outline

(3)

Join: Basics

(4)

Interesting queries often combine data from more than one relation.

We can address several relations in one query by listing them all in the FROM clause.

Distinguish attributes of the same name by

“<relation>.<attribute>” .

Multi-table Queries

(5)

Join (1/2)

EMPNO ENAME …… DEPTNO

7839 KING 10

7566 JONES 20

7900 JAMES 30

7369 SMITH 20

7499 ALLEN 30

DEPTNO DNAME LOC

10 ACCOUNTING NEW YORK 20 RESEARCH DALLAS 30 SALES CHICAGO 40 OPERATION BOSTON FK PK

EMPNO ENAME …… DEPTNO DEPTNO DNAME LOC

7839 KING 10 10 ACCOUNTING NEW YORK

7566 JONES 20 20 RESEARCH DALLAS

7900 JAMES 30 30 SALES CHICAGO

7369 SMITH 20 20 RESEARCH DALLAS

7499 ALLEN 30 30 SALES CHICAGO

EMP DEPT

(6)

Joins that are compliant with the SQL:1999 standard include the following:

Natural joins:

NATURAL JOIN clause

USING clause

ON clause

OUTER joins:

LEFT OUTER JOIN

RIGHT OUTER JOIN

FULL OUTER JOIN

Types of Joins

(7)

Use a join to query data from more than one table:

Joining Tables Using SQL:1999 Syntax

SELECT table1.column, table2.column FROM table1

[NATURAL JOIN table2] |

[JOIN table2 USING (column_name)] |

[JOIN table2 ON (table1.column_name = table2.column_name)]|

[LEFT|RIGHT|FULL OUTER JOIN table2

ON (table1.column_name = table2.column_name)]|

[CROSS JOIN table2];

(8)

Use table prefixes to qualify column names that are in multiple tables.

Use table prefixes to improve performance.

Instead of full table name prefixes, use table aliases.

Table alias gives a table a shorter name:

Keeps SQL code smaller, uses less memory

Use column aliases to distinguish columns that have identical names, but reside in different

Qualifying Ambiguous Column Names

(9)

A Cartesian product is formed when:

A join condition is omitted

A join condition is invalid

All rows in the first table are joined to all rows in the second table

To avoid a Cartesian product, always include a valid join condition.

Cartesian Products (Cross Join)

(10)

The CROSS JOIN clause produces the cross- product of two tables.

This is also called a Cartesian product between the two tables.

Creating Cross Joins

SELECT ename, dname

FROM emp CROSS JOIN dept;

SELECT ename, dname FROM emp, dept;

(11)

The NATURAL JOIN clause is based on all columns in the two tables that have the same name.

It selects rows from the two tables that have equal values in all matched columns.

If the columns having the same names have different data types, an error is returned

Creating Natural Joins

SELECT ename, dname

FROM emp NATURAL JOIN dept;

(12)

If several columns have the same names but the data types do not match, use the USING clause to specify the columns for the equijoin.

Use the USING clause to match only one column when more than one column matches.

Creating Joins with the USING Clause

SELECT employee_id, last_name, location_id, department_id FROM employees JOIN departments

USING (department_id) ;

(13)

The join condition for the natural join is basically an equijoin of all columns with the same name.

Use the ON clause to specify arbitrary conditions or specify columns to join.

The join condition is separated from other search conditions.

The ON clause makes code easy to understand

Creating Joins with the ON Clause (1/3)

SELECT ename, dname FROM emp JOIN dept ON deptno = deptno;

(14)

Table Alias

Oracle can’t distinguish which table’s column is used in the query if tables have the same column name.

If you run the above query, Oracle will raise an error of ambiguous column.

Creating Joins with the ON Clause (2/3)

SELECT ename, dname FROM emp JOIN dept ON deptno = deptno;

(15)

Table Alias

You should write a table name in front of a column name in the join query if the tables have the same column name.

You can use table alias for clearing which table has the column instead of table full name.

Creating Joins with the ON Clause (3/3)

SELECT ename, dname

FROM emp e JOIN dept d ON e.deptno = d.deptno;

SELECT e.ename, d.dname FROM emp e, dept d

WHERE e.deptno = d.deptno;

(16)

Use the AND clause or the WHERE clause to apply additional conditions

Applying Additional Conditions to a Join

SELECT ename, dname, e.deptno FROM emp e JOIN dept d

ON e.deptno = d.deptno WHERE e.deptno=30;

SELECT ename, dname, e.deptno FROM emp e JOIN dept d

ON e.deptno = d.deptno

(17)

Non-equijoin

EMPNO ENAME SAL ……

7839 KING 5000 7566 JONES 2975 7900 JAMES 950 7369 SMITH 800 7499 ALLEN 1600

GRADE LOSAL HISAL

1 700 1200

2 1201 1400

3 1401 2000

4 2001 3000

5 3001 9999

EMPNO ENAME SAL …… GRADE LOSAL HISAL

7839 KING 5000 5 3001 9999

7566 JONES 2975 4 2001 3000

7900 JAMES 950 1 700 1200

7369 SMITH 800 1 700 1200

7499 ALLEN 1600 3 1401 2000

EMP SALGRADE

SALGRADE table defines the LOSAL and HISAL range of values for each GRADE.

Hence, the GRADE column can be used to assign grades to each employee.

(18)

If a join condition is containing something other than an equality comparison operator, the query is a non-equijoin.

Non-equijoin

SELECT *

FROM emp AS e JOIN salgrade s

ON e.sal BETWEEN s.losal AND s.hisal;

(19)

Joining a Table to Itself

EMPNO ENAME MGR ……

7839 KING

7566 JONES 7839 7900 JAMES 7698 7369 SMITH 7902 7499 ALLEN 7698

EMPNO ENAME MGR …… EMPNO ENAME

7566 JONES 7839 7839 KING

7900 JAMES 7698 7698 BLAKE

7369 SMITH 7902 7902 FORD

7499 ALLEN 7698 7698 BLAKE

EMP

EMPNO ENAME ……

7839 KING 7566 JONES 7900 JAMES 7369 SMITH 7499 ALLEN 7698 BLAKE 7902 FORD PK EMP

FK

(20)

A self join is a join of a table to itself.

This table appears twice in the FROM clause and is followed by table aliases that qualify column names in the join condition.

Self join

SELECT e.ename, mgr.ename AS Manager FROM emp e JOIN emp mgr

ON e.mgr = mgr.empno;

(21)

An outer join

returns all rows that satisfy the join condition and those rows from one table for which no rows from the other satisfy the join condition.

To write a query that performs an outer join of tables A and B and returns all rows from A, apply the outer join operator (+) to all columns of B in the join condition..

Returning Records with No Direct Match

Using OUTER Joins

(22)

Returning Records with No Direct Match Using OUTER Joins

EMPNO ENAME …… DEPTNO

7839 KING 10

7566 JONES 20

7900 JAMES 30

7369 SMITH 20

DEPTNO DNAME LOC

10 ACCOUNTING NEW YORK 20 RESEARCH DALLAS 30 SALES CHICAGO 40 OPERATION BOSTON PK

FK

EMPNO ENAME …… DEPTNO DEPTNO DNAME LOC

7839 KING 10 10 ACCOUNTING NEW YORK

7566 JONES 20 20 RESEARCH DALLAS

7900 JAMES 30 30 SALES CHICAGO

EMP DEPT

There are no employees in department 40.

(23)

In SQL:1999, the join of two tables returning only matched rows is called an INNER join.

A join between two tables that returns the results of the INNER join as well as the

unmatched rows from the left (or right) table is called a left (or right) OUTER join.

A join between two tables that returns the

results of an INNER join as well as the results of a left and right join is a full OUTER join.

INNER Versus OUTER Joins (1/3)

(24)

The SQL INNER JOIN

returns the records where table1 and table2 intersect.

The SQL LEFT OUTER JOIN

return the all records from table1 and only those

INNER Versus OUTER Joins (2/3)

(25)

SQL RIGHT OUTER JOIN

returns the all records from table2 and only those records from table1 that intersect with table2.

SQL FULL OUTER JOIN

returns the all records from both table1 and table2.

INNER Versus OUTER Joins (3/3)

(26)

Outer Join

EMPNO ENAME …… DEPTNO

7839 KING 10

7566 JONES 20

7900 JAMES 30

7369 SMITH 20

DEPTNO DNAME LOC

10 ACCOUNTING NEW YORK 20 RESEARCH DALLAS 30 SALES CHICAGO 40 OPERATION BOSTON PK

FK

EMPNO ENAME …… DEPTNO DEPTNO DNAME LOC

7839 KING 10 10 ACCOUNTING NEW YORK

7566 JONES 20 20 RESEARCH DALLAS

7900 JAMES 30 30 SALES CHICAGO

7369 SMITH 20 20 RESEARCH DALLAS

7499 ALLEN 30 30 SALES CHICAGO

EMP DEPT

(27)

Right outer join

Outer Join Practices (1/2)

SELECT ename, dname FROM emp e, dept d

WHERE e.deptno (+)= d.deptno;

SELECT ename, dname

FROM emp RIGHT OUTER JOIN dept USING (deptno);

(28)

LEFT/FULL outer join practice

Outer Join Practices (2/2)

INSERT INTO emp (empno, ename) VALUES (4444,user);

SELECT * FROM emp;

SELECT ename, dname

FROM emp LEFT OUTER JOIN emp USING (deptno);

SELECT ename, dname

FROM dept FULL OUTER JOIN emp USING (deptno);

(29)

Join: Sematics

(30)

Meaning (Semantics) of SQL Queries

SELECT x1.a1, x1.a2, …, xn.ak

FROM R1 AS x1, R2 AS x2, …, Rn AS xn WHERE Conditions(x1,…, xn)

Answer = {}

for x1 in R1 do for x2 in R2 do

…..

for xn in Rn do

if Conditions(x1,…, xn)

then Answer = Answer  {(x1.a1, x1.a2, …, xn.ak)}

return Answer

Almost never the fastest way to compute it!

(31)

An example of SQL semantics

SELECT R.A FROM R, S

WHERE R.A = S.B

A

1 3

B C

2 3 3 4 3 5

A B C

1 2 3 1 3 4 1 3 5 3 2 3 3 3 4 3 3 5

Cross Product

A B C

3 3 4 3 3 5

A

3 3

Apply Projection Apply

Selections / Conditions

Output

(32)

1.

Take cross product:

𝑋 = 𝑅 × 𝑆

2.

Apply selections / conditions:

𝑌 = 𝑟, 𝑠 ∈ 𝑋 𝑟. 𝐴 == 𝑟. 𝐵}

3.

Apply projections to get final output:

𝑍 = (𝑦. 𝐴, ) 𝑓𝑜𝑟 𝑦 ∈ 𝑌

Note the semantics of a join

SELECT R.A

FROM R, S

WHERE R.A = S.B

Recall: Cross product (A X B) is the set of all unique tuples in A,B

Ex: {a,b,c} X {1,2}

= {(a,1), (a,2), (b,1), (b,2), (c,1), (c,2)}

= Filtering!

= Returning only some attributes

(33)

The preceding slides show what a join means

Not actually how the DBMS executes it under the covers

Note: we say “semantics” not “execution

order”

(34)

A Subtlety about Joins

Find all countries that manufacture some product in the ‘Gadgets’ category.

SELECT Country

FROM Product, Company

WHERE Manufacturer=CName AND Category=‘Gadgets’

Product(PName, Price, Category, Manufacturer) Company(CName, StockPrice, Country)

(35)

A subtlety about Joins

PName Price Category Manuf Gizmo $19 Gadgets GWorks Powergizmo $29 Gadgets GWorks SingleTouch $149 Photography Canon

MultiTouch $203 Household Hitachi

Product Company

Cname Stock Country GWorks 25 USA

Canon 65 Japan Hitachi 15 Japan

Country

?

?

SELECT Country

FROM Product, Company

WHERE Manufacturer=Cname AND Category=‘Gadgets’

What is the problem ? What’s the solution ?

(36)

An Unintuitive Query

SELECT DISTINCT R.A FROM R, S, T

WHERE R.A=S.A OR R.A=T.A

What does it compute?

(37)

An Unintuitive Query

SELECT DISTINCT R.A FROM R, S, T

WHERE R.A=S.A OR R.A=T.A

Computes R  (S  T)

But what if S = f?

S T

R

Go back to the semantics!

(38)

Recall the semantics!

1. Take cross-product

2. Apply selections / conditions

3. Apply projection

If S = {}, then the cross product of R, S, T = {}, and the query result = {}!

An Unintuitive Query

SELECT DISTINCT R.A FROM R, S, T

WHERE R.A=S.A OR R.A=T.A

(39)

Joins

Cross joins

Natural joins / Equijoins

Nonequijoins

Self-joins

OUTER joins

Full (or two-sided) OUTER joins

Summary

(40)

Q&A

Figure

Updating...

References

Related subjects :