Since SQL operates on sets, there must be such things as "set operations" that SQL can perform.
In fact, SQL borrows all the standard operations from Set Theory textbooks, along with the terminology. Mercifully, it does not also borrow the notation. So to describe the set operations in this book, we've been able to use some English-looking terms which, in fact, have fairly precise meanings in Set Theory and some English-looking terms which, in fact, are SQL words for our notation. In the following quick introduction, we will describe a sequence of set operations. A sequence, also known as a series, is an ordered set. The order that we'llcontinue
Page 17
follow is the order of execution of the clauses of the most famous of SQL statement types: the query or SELECT statement.
Identity
The easiest operation is to take a copy of the table we started with:
NAME ACTIVITY
Georg Cantor Set Theory Kaiser Wilhelm World War I
The SQL for this is:
. . . FROM Germans . . .
Because this is the first time we've said something "in SQL," we'll regard it as our equivalent of the famous "hello world" — so, some introductions are in order. We've used the ellipsis before.
As usual, it means "and so on," so you can see that there's some stuff both before and after FROM Germans that we're not going to spell out just here. We need the ellipses to indicate that we're not illustrating a whole statement, merely one clause called the FROM clause. The word FROM is an SQL keyword and the word Germans is an SQL Table name. So, the FROM clause takes the Germans table and ends up with a result table that is the same as the Germans table.
Product
In the history of the world, there have been four great Mathematicians:
{Al-Khwarezm, Georg Cantor, Leonhard Euler, Rene Descartes}
If we put that side-by-side with our original Germans set, we get:break
MATHEMATICIANS GERMANS
Al-Khwarezm Georg Cantor
Georg Cantor Kaiser Wilhelm
Leonhard Euler
Leonhard Euler Rene Descartes
Page 18
The Cartesian product operation yields the set of all pairs (x,y) such that x is a member of some set X and y is a member of some set Y. The result of GERMANS [Cartesian Product]
MATHEMATICIANS is the following binary relation.
CARTESIAN_PRODUCT
GERMANS.NAME MATHEMATICIANS.NAME
Al-Khwarezm Georg Cantor
Georg Cantor Georg Cantor
Leonhard Euler Georg Cantor
Rene Descartes Georg Cantor
Al-Khwarezm Kaiser Wilhelm
Georg Cantor Kaiser Wilhelm
Leonhard Euler Kaiser Wilhelm
Rene Descartes Kaiser Wilhelm
You must not object that Al-Khwarezm is unrelated to Kaiser Wilhelm because the spirit of a Cartesian-product relation is that everything relates to everything. The table above is a
relation — it's mathematically valid (even if it doesn't seem to make any sense!). There are several ways to express this operation in SQL. The classic style is:
. . . FROM Germans, Mathematicians . . .
That is, within a FROM clause the expression "table-name-1, table-name-2" means "yield a table which is the Cartesian product of table-name-1 and table-name-2."
Search Condition
In our Cartesian product relation, there is something special about the row:
(Georg Cantor, Georg Cantor)
That row, and that row only, has two column values that are the same. This is significant because the columns are defined in a meaningful way; one 'Georg Cantor' is the name of a German, while the other 'Georg Cantor' is the name of a mathematician. Therefore, Georg Cantor must be the only person who is both a German and a mathematician. (This assumes that names are unique.) And again, there is an SQL way to say this:
. . . FROM Germans, Mathematicians
WHERE Germans.name = Mathematician.name . . .
We now have two clauses in our example. The FROM clause exploded the tables into a Cartesian product relation. The WHERE clause reduces the result to a subset of that relation. The result subset is also a relation — it contains only those rows where this condition is TRUE:
"thecontinue
Page 19
name equals the name." (This is known as a search condition in official SQL vocabulary.) Now we have this result:
CARTESIAN_PRODUCTS
GERMANS.NAME MATHEMATICIANS.NAME
Georg Cantor Georg Cantor
So, the WHERE clause contains a search condition, takes as input the (table) result of the FROM clause, and produces as output a table which contains only those rows where the search condition is TRUE.
Join
In this example so far, the FROM clause contained two table names and the WHERE clause contained a comparison between columns from each table. This two-step process is usually called a join. In modern SQL, there are several ways to ask for a join. We have started with the oldest and best-known way because it best illustrates that a typical join is two separate
operations.
Projection
Our result table now has one row with two columns and the value in each column is 'Georg Cantor' — but we only need one column to answer the question "Who is German and a
mathematician?". If you think of the earlier search condition as being an operation which picks out certain rows, it's easy to get to the next step. A projection is a complementary operation which picks out certain columns. In SQL, we just list the columns we want:
SELECT Germans.name
FROM Germans, Mathematicians
WHERE Germans.name = Mathematicians.name . . .
The column reference after the keyword SELECT is called a select list. We now have a result table that looks like this:
CARTESIAN_PRODUCT GERMANS.NAME
Georg Cantor
So, the select list does a projection on the table produced by the WHERE clause. The result is (as always) a table. The words SELECT Germans.name FROM Germans,
Mathematicians WHERE Germans.name = Mathematicians.name constitute a valid and complete SQL statement. Specifically, this sort of statement is called a query,
presumably because it translates a question (in this case, "What Germans are mathematicians?").
It's also commonly called the SELECT statement.break
Page 20
Other Set Operations
SQL also handles the standard set operations intersect, union, and except. The following is an example of each: two example tables (the "input"), one result table (the "output"), and the SQL statement that makes it happen. To make the examples short, we've used unrealistically small examples with one column per table and no WHERE clauses. But don't worry about syntactical details or the exact workings of the operations — that comes later. You understand enough so far if you grasp that some well-known set operations can be expressed in SQL and work on tables.
Inputs SQL query Outputs
Inputs SQL query Outputs
Leonhard Euler is famed for his remark that "(a + b)**n/n = X, therefore God exists," which shows that any argument looks imposing if you use lots of math symbols. SQL does the opposite.
It has a solid base on a mathematical theory, but the operations of Set Theory are hidden behind a somewhat English-like sentence structure. The result is, on balance, for the good. Most SQL beginners have an easy time grasping the concepts behind WHERE clauses or select lists, while they might have a less easy time with the dense polysymbolic notation of standard Set Theory.
Unfortunately, the SQL language hides the operations so well that the following frequent delusions arise:
1. SQL is a "nonprocedural" language. Perhaps people get this idea from the fact that any operation on a set should affect all set members simultaneously. But the set operations themselves are ordered. One step follows another.
2. A whole SQL query operates on the tables in the FROM clause. This mode of thinking causes people to make certain common errors which could be avoided if they kept in mindcontinue
Page 21
the truth; that each set operation produces a new, nameless, "virtual" table and passes it on.
(Well, perhaps we should add "conceptually" — there will be hundreds of times in this book that we could add "conceptually" because your DBMS may do internal things in some out-of-order
sequence, provided the method doesn't affect the results. That is not our present concern. What we must know from the outset is how a human being is supposed to view the operations.) Coming away from this chapter, you only need to know that SQL has a specialized vocabulary which to a large extent arises from Set Theory and that SQL operations happen on tables. You should regard this as background information. This book takes a bottom-up approach, starting with the smallest units (the column and row values) so it will be several chapters before we reach the top and begin to emphasize the sets once more.