FULL OUTER JOIN - teradata-sqlunleash

The last form of the OUTER JOIN is a FULL OUTER JOIN. If both Customer and Order exceptions are to be included in the output report, then the syntax should appear as:

<Outer-table> FULL OUTER JOIN <Outer-table>

<Outer-table> FULL JOIN <Outer-table>

A FULL OUTER JOIN uses both of the tables as outer tables. The exceptions are returned from both tables and the missing column values from either table are extended with NULL.

This puts the LEFT and RIGHT OUTER JOIN output into a single report.

To return the customers with orders, and include the orders without customers and customers without orders, the following FULL OUTER JOIN can be used:

SELECT Customer_name ,Order_number

,Order_total (format '$$$,$$9.99-' )

FROM Customer_table cust FULL OUTER JOIN Order_table ord ON cust.customer_number = ord.customer_number ORDER BY 1 ;

7 Rows Returned

Customer_name Order_number Order_total

? 999999

$1.00-Ace Consulting 123552 $5,111.47

Acme Products ? ?

Billy's Best Choice 123512 $8,005.91

Billy's Best Choice 123456 $12,347.53

Databases N-U 123585 $15,231.62

XYZ Plumbing 123777 $23,454.84

The output from the SELECT consists of all the rows from the Order and Customer tables because they are now both outer tables in a FULL OUTER JOIN.

The total number of rows returned is more difficult to predict with a FULL OUTER JOIN. The answer set contains: one row for each of the matching rows from the tables, plus one row for each of the missing rows in the left table, plus one for each of the missing rows in the right table.

Since both tables are outer tables, not as much thought is required for choosing the outer table. However, as mentioned earlier the INNER and OUTER join processing can be combined in a single SELECT. The INNER JOIN still eliminates all non-matching rows. This is when the most consideration needs to be given to the appropriate outer tables.

Like all joins, more than two tables can be joined using a FULL OUTER JOIN, up to 64 tables. The next FULL OUTER JOIN syntax uses Student and Course tables for the outer tables through the entire join process:

SELECT Last_name (Title 'Last Name') ,First_name AS First

,S.Student_ID

,Course_name AS Course FROM Student_table AS S

FULL OUTER JOIN Student_Course_table AS SC ON S.student_id = SC.student_id

FULL OUTER JOIN Course_table AS C ON C.course_id = SC.course_id ORDER BY Course, Last_name ; 15 Rows Returned

Last Name

First Student_ID Course

Larkins Michael 423400 ?

McRoberts Richard 280023 Advanced

SQL

Wilson Susie 231222 Advanced

SQL

Last

Bond Jimmy 322133 V2R3 SQL

Features

Hanson Henry 125634 V2R3 SQL

Features

Wilson Susie 231222 V2R3 SQL

Features

The above SELECT uses the Student, Course and "Student Course" (associative) tables in a FULL OUTER JOIN. All three tables are outer tables. The above includes one non-matching row from the Student table with a null in the course name and one non-non-matching row from the course table with nulls in all three columns from the Student table. Since the Student Course table is also an outer table, if there were any non-matching rows in it, they can also be returned containing a null in its columns. However, since it is an associative table used only for a many-to-many relationship between the Student and Course tables, missing rows in it would indicate a serious business problem.

As a reminder, the result of the first join step is stored in spool, which is temporary work space that the system uses to complete each step of the SELECT. Then, the spool area is used for each consecutive JOIN step. This continues until all of the tables have been joined together, two at a time. However, the spool areas are not held until the end of the SELECT.

Instead, when the spool is no longer needed, it is released immediately. This makes more spool available for another step, or by another user. The release can be seen in the EXPLAIN output as (Last Use) for a spool area.

Also, when using Teradata, do not spend a lot of time worrying about which tables to join first. The optimizer makes this choice at execution time. The optimizer always looks for the fastest method to obtain the requested rows. It uses data distribution and index

demographics to make its final decision on a methodology. So, the tables joined first in the syntax, might be the last tables joined in the execution plan.

All databases join tables two at a time, but most databases just pick which tables to join based on their position in the FROM. Sometimes when the SQL runs slow, the user just changes the order of the tables in the join. Otherwise, join schemas must be built to tell the RDBMS how to join specific tables.

Teradata is smart enough, using explicit or implicit STATISTICS, to evaluate which tables to join together first. Whenever possible, four tables might be joined at the same time, but it is still done as two, two-table joins in parallel. Joins involving millions of rows are considered difficult for most databases, but Teradata joins them with ease.

It is a good idea to use the Teradata EXPLAIN, to see what steps the optimizer plans to use to accomplish the request. Primarily in the beginning you are looking for an estimate of the number of rows that will be returned and the time cost to accomplish it. I recommend using the EXPLAIN before each join as you are learning to make sure that the result is reasonable.

If these numbers appear to be too high for the tables involved, it is probably a Cartesian product; which is not good. The EXPLAIN discovers the product join within seconds instead of hours. If it were actually running, it would be wasting resources by doing all the extra work to accomplish nothing. Use the EXPLAIN to learn this fact the easy way and fix it.

In document teradata-sqlunleash (Page 114-117)