SAS SQL
Contents
Performing queries using PROC SQL
Performing advanced queries using PROC SQL
Combining tables horizontally using PROC SQL
Combining tables vertically using PROC SQL
Performing Queries Using PROC SQL (1)
Introduction
PROC SQL uses statements that are written in Structured Query Language (SQL), which is a standardized language that is widely used to retrieve and update data in tables and in views that are based on those tables. When you want to examine relationships between data values, subset your data, or compute values, the SQL procedure provides an easy, flexible way to analyze your data.
PROC SQL differs from most other SAS procedures in several ways:
Many statements in PROC SQL, such as the SELECT statement, are composed of clauses.
The PROC SQL step does not require a RUN statement.
PROC SQL continues to run after you submit a step. To end the procedure, you must submit another PROC step, a DATA step, or a QUIT statement.
proc sql outobs=10;
select idnum, jobcode, salary,
salary*.06 as bonus from sql.payroll
where salary<32000 order by jobcode;
quit;
Performing Queries Using PROC SQL (2)
Writing a PROC SQL Step
Before creating a query, you must assign a libref to the SAS data library in which the table to be used is stored. Then you submit a PROC SQL step.
You use the PROC SQL statement to invoke the SQL procedure.
General form, basic PROC SQL step to perform a query:
PROC SQL;
SELECT column-1<,...column-n>
FROM table-1|view-1<,...table-n|view-n>
<WHERE expression>
<GROUP BY column-1<, ... column-n>>
<ORDER BY column-1<,... column-n>>;
where
PROC SQL: invokes the SQL procedure
SELECT: specifies the column(s) to be selected FROM: specifies the table(s) to be queried
WHERE: subsets the data based on a condition
GROUP BY: classifies the data into groups based on the specified column(s) ORDER BY: sorts the rows that the query returns by the value(s) of the
Performing Queries Using PROC SQL (3)
Selecting Columns
To specify which column(s) to display in a query, you write a SELECT clause as the first clause in the SELECT statement. In the SELECT clause, you can specify existing columns and create new columns that contain either text or a calculation.
Specifying the Table
You specify the table to be queried in the FROM clause.
Specifying Subsetting Criteria
To subset data based on a condition, write a WHERE clause that contains an expression.
Ordering Rows
The order of rows in the output of a PROC SQL query cannot be guaranteed, unless you specify a sort order. To sort rows by the values of specific columns, use the ORDER BY clause.
proc sql outobs=10;
select idnum, jobcode, salary,
salary*.06 as bonus from sql.payroll
where salary<32000 order by jobcode;
Performing Queries Using PROC SQL (4)
Querying multiple tables (1)
You can use a PROC SQL step to query data that is stored in two or more tables. In SQL terminology, this is called joining tables.
Follow these steps to join multiple tables:
Specify column names from one or both tables in the SELECT clause and, if you are selecting a column that has the same name in multiple tables, prefix the table name to that column name.
Specify each table name in the FROM clause.
Use the WHERE clause to select rows from two or more tables, based on a condition.
Use the ORDER BY clause to sort rows that are retrieved from two or more tables by the values of the selected column(s).
Performing Queries Using PROC SQL (5)
Querying multiple tables (2)
proc sql;
select payroll.idnum, payroll.salary, payroll2.salary as newsalary
from sql.payroll, sql.payroll2
where payroll.idnum=payroll2.idnum order by payroll.idnum;
quit;
Performing Queries Using PROC SQL (6)
Summarizing groups of data
You can use a GROUP BY clause in your PROC SQL step to
summarize data in groups. The GROUP BY clause is used in queries that include one or more summary functions. Summary functions produce a statistical summary for each group that is defined in the GROUP BY clause.
proc sql;
select date,
sum(boarded) as DayCarry from sql.march
group by date;
quit;
Performing Queries Using PROC SQL (7)
Summary Functions Definition
AVG, MEAN mean or average of values COUNT, FREQ, N number of nonmissing values
CSS corrected sum of squares
CV coefficient of variation (percent)
MAX largest value
MIN smallest value
NMISS number of missing values
PRT probability of a greater absolute value of Student’s t
RANGE range of values
STD standard deviation
STDERR standard error of the mean
SUM sum of values
SUMWGT sum of the WEIGHT variable values
T Student’s t value for testing the hypothesis that the population mean is zero
USS uncorrected sum of squares
Performing Queries Using PROC SQL (8)
Creating output tables (1)
To create a new table from the results of your query, you can use the CREATE TABLE statement in your PROC SQL step. This statement
enables you to store your results in a table instead of displaying the query results as a report.
General form, basic PROC SQL step for creating a table from a query result:
PROC SQL;
CREATE TABLE table-name AS SELECT column-1<,...column-n>
FROM table-1|view-1<,...table-n|view-n>
<WHERE expression>
<GROUP BY column-1<,... column-n>>
<ORDER BY column-1<,... column-n>>;
where
table-name: specifies the name of the table to be created.
Performing Queries Using PROC SQL (9)
Creating output tables (2)
proc sql;
create table marchdc as select date,
sum(boarded) as DayCarry from sql.march
group by date;
quit;
Because the CREATE TABLE statement is used, this query does not create a report. The SAS log verifies that the table was created and indicates how many rows and columns the table contains.
Performing Queries Using PROC SQL (10)
Additional features
To further refine a PROC SQL query that contains a GROUP BY clause, you can use a HAVING clause. A HAVING clause works with the GROUP BY clause to restrict the groups that are displayed in the output, based on one or more specified conditions.
proc sql;
select jobcode, avg(salary) as Avg from sql.payroll
group by jobcode
having avg(salary)>40000 order by jobcode;
quit;
Viewing SELECT statement syntax
When you construct a SELECT statement, you must specify the clauses in the following order:
1.SELECT 2.FROM 3.WHERE 4.GROUP BY 5.HAVING 6.ORDER BY
Note: Only the SELECT and FROM clauses are required.
Performing Advanced Queries Using PROC SQL (1)
Displaying all columns
To display all columns in the order in which they are stored in the table, use an asterisk (*) in the SELECT clause. To write the
expanded list of columns to the SAS log, use the FEEDBACK option in the PROC SQL statement.
proc sql;
select *
from sql.payroll2;
Using the FEEDBACK Option
proc sql feedback;
select *
from sql.payroll2;
Performing Advanced Queries Using PROC SQL (2)
Limiting the number of rows displayed
To limit the number of rows that PROC SQL displays as output, use the OUTOBS=n option in the PROC SQL statement.
General form, PROC SQL statement with OUTOBS= option:
PROC SQL OUTOBS= n;
Where n : specifies the number of rows.
proc sql outobs=10;
select idnum, salary from sql.payroll;
Performing Advanced Queries Using PROC SQL (3)
Eliminating duplicate rows from output
To eliminate duplicate rows from your query results, use the keyword DISTINCT in the SELECT clause.
proc sql;
select distinct flight, miles from sql.march
order by 1;
quit;
Performing Advanced Queries Using PROC SQL (4)
Performing Advanced Queries Using PROC SQL (5)
Subsetting rows by using conditional operators (1)
In a PROC SQL query, use the WHERE clause with any valid SAS expression to subset data. The SAS expression can contain one or more operators, including the following conditional operators:
the BETWEEN-AND operator selects within an inclusive range of values
where salary between 70000 and 80000
the CONTAINS or ? operator selects a character string where name contains ’ER’
where name ? ’ER’
the IN operator selects from a list of fixed values where code in (’PT’ , ’NA’, ’FA’)
the IS MISSING or IS NULL operator selects missing values
where dateofbirth is missing where dateofbirth is null
Subsetting rows by using conditional operators (2)
Using the LIKE Operator to Select a Pattern
proc sql;
select *
from sql.staff
where lname like '%SON';
quit;
The pattern %SON’ specifies the following sequence:
any number of characters (%) the string PLACE.
Special Character Represents
underscore ( _ ) any single character percent sign (%) any sequence of zero or
more characters
LIKE Pattern Name(s) Selected LIKE 'D_an' Dyan
LIKE 'D_an_' Diana, Diane LIKE 'D_an__' Dianna
LIKE 'D_an%' all names from the list
Performing Advanced Queries Using PROC SQL (6)
Subsetting rows by using calculated values
It is important to understand how PROC SQL processes
calculated columns. When you use a column alias in the WHERE or the HAVING clause to refer to a calculated value, you must also use the keyword CALCULATED along with the alias.
proc sql outobs=10;
select idnum, jobcode, salary, salary*.06 as bonus
from sql.payroll
where calculated bonus < 1000 order by jobcode;
quit;
Performing Advanced Queries Using PROC SQL (7)
Enhancing query output
You can enhance PROC SQL query output by using SAS enhancements such as column formats and labels, titles and footnotes, and character
constraints.
proc sql outobs=15;
title 'Current Bonus Information';
title2 'Employees with Salaries > $75,000';
select idnum label='Employee ID', jobcode label='Job Code', salary,
'bonus is:',
salary * .10 format=dollar12.2 from sql.payroll
where salary>75000 order by salary desc;
quit;
Performing Advanced Queries Using PROC SQL (8)
Points to remember (1)
COUNT values:
Counting Nonmissing Values: select count (Country) as Number Counting All Rows: select count(*) as Number
Table aliases are usually optional. However, there are two situations that require their use.
a table is joined to itself (called a self-join or reflexive join)
from airline.staffmaster as s1, airline.staffmaster as s2
you need to reference columns from same-named tables in different libraries
from airline.flightdelays as af, work.flightdelays as wf
where af.delay > wf.delay
Performing Advanced Queries Using PROC SQL (9)
Points to remember (2)
Unlike in PROC SORT, “desc” is instead of “descending” and is used behind the variable.
order by Type desc, Name;
You can sort by any column within the SELECT clause by
specifying its numerical position. By specifying a position instead of a name, you can sort by a calculated column that has no alias.
select Name, Population, Area,
Population/Area label=’Density’
from countries order by 4;
Performing Advanced Queries Using PROC SQL (10)
Points to remember (3)
If you specify a GROUP BY clause in a query that does not
contain a summary function, your clause is changed to an ORDER BY clause, and a message to that effect is written to the SAS log.
proc sql;
select *
from sql.payroll2 group by jobcode;
quit;
WARNING: A GROUP BY clause has been transformed into an ORDER BY clause because neither the SELECT clause nor the optional HAVING clause of the associated table-expression
referenced a summary function.
Performing Advanced Queries Using PROC SQL (11)
Combining Tables Horizontally Using PROC SQL
Contents
Understanding joins
Generating a cartesian product Using inner joins
Using outer joins
Comparing SQL joins and DATA step match-merges Understanding the advantages of PROC SQL joins
Understanding Joins
Joins combine tables horizontally (side by side) by
combining rows. The tables being joined are not required to
have the same number of rows or columns.
Generating a Cartesian Product
When you specify multiple tables in the FROM clause but do not
include a WHERE statement to
subset data, PROC SQL returns the Cartesian product of the tables. In a Cartesian product, each row in the first table is combined with every row in the second table.
proc sql;
select *
from one, two;
3 3
9
Using Inner Joins
An inner join combines and
displays only the rows from the first table that match rows from the
second table, based on the matching criteria (also known as join conditions) that are specified in the WHERE
clause. Note: An inner join is
sometimes called a conventional join.
You can combine a maximum of 32 tables in a single inner join.
proc sql;
select *
from one, two
where one.x = two.x;
Using Outer Joins (1)
An outer join combines and displays all rows that match across tables, based on the specified matching criteria
(also known as join conditions), plus some or all of the rows that do not match.
Type of outer join
Left: All matching rows plus nonmatching rows from the first table specified in the FROM clause (the left table)
Right: All matching rows plus nonmatching rows from the second table specified in the FROM clause (the right table)
Full: All matching rows plus nonmatching rows in both tables
Using Outer Joins (2)
Examples of outer joins
Left join:
proc sql;
select *
from one left join two on one.x=two.x;
Right join:
proc sql;
select *
from one right join two on one.x=two.x;
Full join:
proc sql;
select *
from one full join two on one.x=two.x;
Note: An inner join that uses this syntax can be performed on only two tables or views at a time. When an inner
Comparing SQL Joins and DATA Step Match-Merges (1)
Let’s compare the use of SQL joins and DATA step match-merges in the following situations:
when all of the values of the selected variable (column) match when only some of the values of the selected variable (column) match.
When all of the values match
When all of the values of the BY variable match, you can use a PROC SQL inner join to produce the same results as a DATA step match-merge.
DATA step match-merge
data merged;
merge one two;
by x;
run;
PROC SQL inner join
proc sql;
select one.x, a, b from one, two
where one.x = two.x order by x;
Comparing SQL Joins and DATA Step Match-Merges (2)
When only some of the values match
Unlike the DATA step match-merge, however, a PROC SQL outer join does not overlay the two common columns by default.
DATA step match-merge
data merged;
merge three four;
by x;
run;
PROC SQL full outer join
proc sql;
select three.x, a, b
from three full join four on three.x = four.x order by x;
Comparing SQL Joins and DATA Step Match-Merges (3)
When only some of the values match: using the COALESCE function
When you add the COALESCE function to the SELECT clause of the PROC SQL outer join, the PROC SQL outer join can produce the same result as a DATA step match-merge.
PROC SQL full outer join proc sql;
select coalesce(three.x, four.x) as X, a, b from three full join four
on three.x = four.x;
Comparing SQL Joins and DATA Step Match-Merges (4)
Understanding the Advantages of PROC SQL Joins
PROC SQL joins do not require sorted tables.
PROC SQL joins do not require that the columns in join expressions have the same name.
proc sql;
select table1.x, lastname, status from table1, table2
where table1.id = table2.custnum;
PROC SQL joins can use comparison operators other than the equal sign (=).
proc sql;
select a.itemnumber, cost, price from table1 as a,
table2 as b
where a.itemnumber = b.itemnumber and a.cost>b.price;
Combining Tables Vertically Using PROC SQL
Contents
Introducing set operators
Using the EXCEPT set operator Using the INTERSECT set operator Using the UNION set operator
Using the OUTER UNION set operator
Producing rows from the first query or the second query
Introducing Set Operators (1)
Understanding set operations
A set operation is a SELECT statement that contains
two groups of query clauses (each group beginning with a SELECT clause)
a set operator
optionally, one or both of the keywords ALL and CORR (CORRESPONDING).
Introducing Set Operators (2)
EXCEPT: Selects unique rows from the first table that are not found in the second table.
INTERSECT: Selects unique rows that are common to both tables.
UNION: Selects unique rows from one or both tables.
OUTER UNION: Selects all rows from both tables.
Introducing Set Operators (3)
Processing unique vs. duplicate rows
When processing a set operation that displays only unique rows (a set operation that contains the set operator EXCEPT, INTERSECT, or UNION), PROC SQL makes two passes through the data, by default:
PROC SQL eliminates duplicate (nonunique) rows in the tables.
PROC SQL selects the rows that meet the criteria and, where requested, overlays columns.
Using the EXCEPT Set Operator (1)
Using the EXCEPT operator alone (1)
Suppose you want to display the unique rows in table One that are not found in table Two.
proc sql;
select *
from one except
select *
from two;
Using the EXCEPT Set Operator (2)
Using the EXCEPT operator alone (2)
The set operator EXCEPT overlays
columns by their position. In this output, the following columns are overlaid:
the first columns, One.X and Two.X, both of which are numeric
the second columns, One.A and Two.B, both of which are character.
The column names from table One are used, so the second column of output is named A rather than B.
Using the EXCEPT Set Operator (3)
Using the EXCEPT operator alone (3)
Let’s take a closer look at this example to see exactly how PROC SQL selects rows from table One to display in output.
In the first pass, PROC SQL eliminates any duplicate rows from the tables. There is one duplicate row: in table One, the second row is a duplicate of the first row. All remaining rows in table One are still candidates in PROC SQL’s selection process.
In the second pass, PROC SQL identifies any rows in table One for which there is a matching row in table Two and eliminates them. There is one matching row in the two tables, as shown below, which is eliminated.
Using the EXCEPT Set Operator (4)
Combining and overlaying columns
By default, the set operators EXCEPT, INTERSECT, and UNION overlay columns based on the relative position of the columns in the SELECT clause. Column names are ignored. You control how PROC SQL maps columns in one table to columns in another table by
specifying the columns in the appropriate order in the SELECT
clause. The first column specified in the first query’s SELECT clause and the first column specified in the second query’s SELECT clause are overlaid, and so on.
Using the EXCEPT Set Operator (5)
Modifying results by using keywords
ALL
Makes only one pass through the data and does not remove duplicate rows.
CORR (or CORRESPONDING)
Compares and overlays columns by name instead of by position:
When used with EXCEPT, INTERSECT, and UNION, removes any columns that do not have the same name in both tables.
When used with OUTER UNION, overlays same-named columns and displays columns that have nonmatching names without overlaying.
Using the EXCEPT Set Operator (6)
Using the keyword ALL with the EXCEPT operator (1)
To select all rows in the first table (both unique and duplicate) that do not have a matching row in the second table, add the keyword ALL after the EXCEPT set operator.
proc sql;
select *
from one except all select *
from two;
Using the EXCEPT Set Operator (7)
Using the keyword ALL with the EXCEPT operator (2)
PROC SQL has again eliminated the one row in table One (the fifth row) that has a matching row in table Two (the fourth row).
Remember that when the keyword ALL is used with the EXCEPT operator, PROC SQL does not make an extra pass through the data to remove duplicate rows within table One.
Therefore, the second row in table One, which is a duplicate of the first row, is now included in the output.
Using the EXCEPT Set Operator (8)
Using the keyword CORR with the EXCEPT operator (1)
To display both of the following, add the keyword CORR after the set
operator.
only columns that have the same name
all unique rows in the first table that do not appear in the second table.
proc sql;
select * from one except corr select *
Using the EXCEPT Set Operator (9)
Using the keyword CORR with the EXCEPT operator (2)
In the first pass, PROC SQL
eliminates the second and third rows of table One from the output because they are not unique within the table; they contain values of X that duplicate the value of X in the first row of table One.
Note that column A is not used.
In the second pass, PROC SQL eliminates the first, fourth, and fifth rows of table One because each contains a value of X that matches a value of X in a row of table Two. The output displays the two remaining rows in table One, the rows that are unique in table One and that do not have a row in table Two that has a matching value of X.
Using the EXCEPT Set Operator (10)
Using the keywords ALL and
CORR with the EXCEPT operator (1)
If the keywords ALL and CORR are used together, the EXCEPT operator will display all unique and duplicate rows in the first table that do not appear in the second table, and will overlay and display only columns that have the same name.
proc sql;
select * from one
except all corr select *
from two;
Using the EXCEPT Set Operator (11)
Using the keywords ALL and CORR with the EXCEPT operator (2)
Because the ALL keyword is used, PROC SQL does not eliminate any duplicate rows in table One. PROC SQL does eliminate the first, fourth, and fifth rows in table One from the
output because for each one of these three rows there is a corresponding row in table Two that has a matching value of X.
Table One contains three rows in which the value of X is 1, but table Two contains only one row in which the value of X is 1. That one row in table Two only causes the first of the three rows in table One that have a matching value of X to be eliminated from the output.
Using the INTERSECT Set Operator (1)
Using the INTERSECT operator Alone
The INTERSECT operator compares and overlays columns in the sameway as the EXCEPT operator, by column position instead of column name. However,
INTERSECT selects rows differently and displays in output the unique rows that are common to both tables.
proc sql;
select * from one intersect select *
from two;
Using the keyword ALL with the INTERSECT operator
Adding the keyword ALL to the preceding PROC SQL query prevents PROC SQL from making an extra pass through the data. If there were any rows common to tables One and Two that were duplicates of other common rows, they would also be included in output.
proc sql;
select * from one intersect all select *
from two;
Using the INTERSECT Set Operator (2)
Using the keyword CORR with the INTERSECT operator
To display the unique rows that are common to the two tables
based on the column name instead of the column position, add the
CORR keyword to the PROC SQL set operation.
Note that column A and B are not used.
proc sql;
select * from one intersect corr select *
from two;
Using the INTERSECT Set Operator (3)
Using the keywords ALL and CORR with the INTERSECT operator
If the keywords ALL and CORR are used together, the INTERSECT
operator will display all unique and nonunique (duplicate) rows that are common to the two tables, based on columns that have the same name.
proc sql;
select * from one
intersect all corr select *
from two;
Using the INTERSECT Set Operator (4)
Using the UNION Set Operator (1)
Using the UNION operator alone
To display all rows from the tables One and Two that are unique in the combined set of rows from both tables, use a PROC SQL set operation that includes the UNION operator:
proc sql;
select * from one union
select *
from two;
Using the UNION Set Operator (2)
Using the keyword ALL with the UNION operator
When the keyword ALL is added to the UNION operator, the output displays all rows from both tables, both unique and duplicate.
proc sql;
select * from one union all select *
from two;
Using the UNION Set Operator (3)
Using the keyword CORR with the UNION operator
To display all rows from the tables One and Two that are unique in the combined set of rows from both tables, based on columns that have the same name rather than the same position, add the keyword CORR after the set operator.
proc sql;
select * from one union corr select *
from two;
Using the keywords ALL and CORR with the UNION operator
If the keywords ALL and CORR are used together, the UNION
operator will display all rows in the two tables both unique and
duplicate, based on the columns that have the same name.
proc sql;
select * from one union all corr select *
from two;
Using the UNION Set Operator (4)
Using the OUTER UNION Set Operator (1)
The set operator OUTER
UNION concatenates the results of the queries by
selecting all rows (both unique and nonunique) from both tables
not overlaying columns.
Let’s see how OUTER UNION works when used alone and with the keyword CORR. The ALL
keyword is not used with OUTER UNION because this operator’s default action is to include all rows in output.
Using the OUTER UNION Set Operator (2)
Using the OUTER UNION operator alone
Suppose you want to display all rows from both of the tables One and Two, without
overlaying columns.
proc sql;
select * from one outer union select *
from two;
In the output, the columns have not been overlaid. Instead, all four columns from both tables are displayed. Each row of output
contains missing values in the two columns that correspond to the other table.
Using the keyword CORR with the OUTER UNION operator
The output from the preceding set operation contains two columns with the same name. To overlay the columns with a common name, add the CORR keyword to the set operation:
proc sql;
select * from one
outer union corr select *
from two;
The output from the modified set operation contains only three columns, because the two columns named X are overlaid.
Using the OUTER UNION Set Operator (3)
Using the OUTER UNION Set Operator (4)
Program 1: PROC SQL OUTER UNION Set Operation with CORR
proc sql;
create table three as select * from one outer union corr select * from two;
quit;
Program 2: DATA Step, SET Statement, and PROC PRINT Step
data three;
set one two;
run;
proc print data=three noobs;
run;
Comparing Outer Unions and other SAS techniques (1)
A PROC SQL set operation that uses the OUTER UNION operator is just one SAS technique that you can use to concatenate tables.
These two programs create the same table as output.
Using the OUTER UNION Set Operator (5)
Comparing Outer Unions and other SAS techniques (2)
When tables have a same-named column, the PROC SQL outer union will not produce the
same output unless the keyword CORR is also used. CORR causes the same-named columns to be overlaid; without CORR, the OUTER
UNION operator will include both of the same- named columns in the result set. The DATA step program will generate only one column X.
The two concatenation techniques shown above also vary in efficiency. A PROC SQL set operation generally requires more computer resources but may be more convenient and flexible than the DATA step equivalent.
Producing Rows from the First Query or the Second Query
There is no keyword in PROC SQL that returns unique rows from the first and second table, but not rows that occur in both.
Here is one way you can simulate this operation:
(query1 except query2) union
(query2 except query1)
This example shows how to use this operation.
proc sql;
(select * from sql.a except
select * from sql.b) union
(select * from sql.b except
select * from sql.a);
The first EXCEPT returns one unique row from the first table (table A) only. The second EXCEPT returns one unique row from the second table (table B) only. The middle UNION combines the two results.