• No results found

Outline. SAS-seminar Proc SQL, the pass-through facility. What is SQL? What is a database? What is Proc SQL? What is SQL and what is a database

N/A
N/A
Protected

Academic year: 2021

Share "Outline. SAS-seminar Proc SQL, the pass-through facility. What is SQL? What is a database? What is Proc SQL? What is SQL and what is a database"

Copied!
5
0
0

Loading.... (view fulltext now)

Full text

(1)

SAS-seminar

Proc SQL, the pass-through facility

How to make your data processing someone

else’s problem

Outline

What is SQL and what is a database

Quick introduction to Proc SQL

The pass-through facility

Examples…

SAS SAS SAS SAS/Proc SQL

What is SQL?

SQL = Structured Query Language.

Developed by IBM in the 1970s for querying, modifying and updating relational

databases

Adopted and standardized by ANSI and ISO during the 1980s

Used with (almost) all RDBMS (Relational Data Base Management System) e.g. Oracle, DB2, Access, MySQL etc.

Most RDBMS also offer enhancements to ANSI SQL

SAS SAS SAS SAS/Proc SQL

What is a database?

“A database is a collection of logically related data and a description of this data, designed to meet the need for information in an organization.”

For me, a database is just a collection of tables…

SAS SAS SAS SAS/Proc SQL

What is Proc SQL?

Proc SQL is a SAS implementation of SQL

It can be conceptualized as a

combination of the data step, proc summary and proc sort, all at once

Using Proc SQL you can execute SQL queries on a remote database server

(2)

Proc SQL intro - terminology

Union Append

Join Merge

Row Observation

Column Variable

Table Dataset

Proc SQL Proc SQLProc SQL Proc SQL SAS Data step

SAS Data stepSAS Data step SAS Data step

Proc SQL syntax

From the name follows that it is extremely structured

It is highly scalable and can be used for a multitude of different tasks –

anything from simple counts to complicated joins of multiple tables and sub queries

It typically cannot do anything that ordinary SAS can’t…

SAS SAS SAS SAS/Proc SQL

Proc SQL syntax - selecting

proc proc proc proc sqlsqlsqlsql;;;;

create table tablename as select [distinct]

column1, column2, [*], … from library.table where expression order by column1;

quit quit quit quit;

* = all columns

data tablename;

keep column1, column2;

set library.table;

where expression;

proc sort;

by column1;

run;

SAS SAS SAS SAS/Proc SQL

SQL example 1 - selecting

procproc

procproc sqlsqlsqlsql;

create table womenas select

*

from cblood.persons where sex = 2 order by birthdate;

quit quitquit quit;

* = all variables

SAS SAS SAS SAS/Proc SQL

SQL example 2 - selecting

proc

proc proc proc sqlsqlsqlsql;;;;

create table patientsas select distinct

idnr

from cblood.transfusion; quitquit

quitquit;

distinct no duplicate values

SAS SAS SAS SAS/Proc SQL

Proc SQL syntax - modifying

proc sql;

create table tablenameas select

function(column1) as new1,

column2[+|-|*|/] column3as new2 from library.table;

quit quitquit quit;

(3)

SQL example 3 - modifying

proc proc proc proc sqlsqlsql;;;;sql

create table dead as select

idnr,

(deathdate-birthdate)/365.24 as age from cblood.persons

where not deathdate is null;

quit;

quit;

quit;

quit;

Proc SQL syntax - summarizing

proc proc proc proc sqlsqlsql;;;;sql

create table tablenameas select

summary_function(column1) as new1 from library.table

group by byvariable having group_conditions;

quit;

quit;quit;

quit;

SAS SAS SAS SAS/Proc SQL

SQL example 4 - summarizing

proc

proc proc proc sqlsqlsqlsql;;;;

create table cancers as select

idnr,

count(*) as cancers from cblood.cancer group by idnr;

quit;

quit;

quit;

quit;

The count() function returns the number of rows as defined by the group statement

SAS SAS SAS SAS/Proc SQL

SQL example 5 - summarizing

proc proc proc proc sqlsqlsql;;;;sql

create table unlucky_few as select

idnr,

count(*) as count from cblood.cancer group by idnr having count(*) > 10;

quit;

quit;quit;

quit;

Returns 156 unlucky souls…

… with 10 or more cancers

SAS SAS SAS SAS/Proc SQL

Proc SQL syntax – joining tables

proc proc proc proc sqlsqlsqlsql;;;;

create table tablename as select

alias1.column1, alias2.column1

from library.table1 as alias1 [inner|left|right|full] join library.table2 as alias2 on join clause

quit;

quit;

quit;

quit;

Specifies how the two tables are joined Sort of like the by statement in a SAS merge

SAS SAS SAS SAS/Proc SQL

Cartesian products…

proc proc proc proc sqlsqlsqlsql;;;;

create table tablename as select

alias1.column1, alias2.column1

from library.table1 as alias1, library.table2 as alias2 quit;

quit;quit;

quit;

Failing to specify a join clause will create a cartesian product.

i.e. every row in table1 will be matched to every row in table2…

(4)

SQL example 6 - joining

proc proc proc proc sqlsqlsql;;;;sql

create table unlucky_donors as select

a.idnr

from cblood.donor as a

inner join cblood.cancer as b on a.idnr=b.idnr

group by a.idnr having count(*) > 10;

quit;

quit;

quit;

quit;

Returns only 4 unlucky donors

proc proc proc proc sqlsqlsql;;;;sql

Create view table1 as select

coalesce(a.idnr,b.idnr) as idnr from cblood.donor as a

full outer join cblood.cancer as b on a.idnr=b.idnr

group by a.idnr having count(*) > 10;

quit;

quit;quit;

quit;

SAS SAS SAS SAS/Proc SQL

SQL example – select controls

procproc

procproc sqlsqlsqlsql outobs=10;

create table controls as select

a.casenr label='', a.exit as indexdate, b.recipient label='', ranuni(2222) as randomnr from cases a left join available_controls b on a.entry-180180180 le b.entry le a.entry+180180 180180180

and a.country=b.country and a.county=b.county and a.bloodgroup=b.bloodgroup and a.age=b.age

and b.entry < a.exit < b.exit where a.county ne ....

and b.county ne ....

order by a.casenr, calculated randomnr;

quit quit quit quit;

SAS SAS SAS SAS/Proc SQL

What is the pass-through facility?

The Proc SQL pass-through facility is a SAS tool for directly with a database server/RDBMS

The Pass-through facility can:

- Efficiently run complicated queries - Use native RDBMS functions

- Execute RDBMS-specific commands - You can use RDBMS indices

SAS SAS SAS SAS/Proc SQL

When do we pass through?

When Proc SQL doesn’t get it…

proc proc proc proc sqlsqlsqlsql;;;;

create table yearly_counts as

select

year(dondate) as year, count(*) as count from cblood.donation group by year(dondate);

quit;

quit;

quit;

quit;

Not a standard SQL function, so SAS downloads the whole dataset and performs the query locally

~12 minutes

SAS SAS SAS SAS/Proc SQL

So, how do we “Pass-through”?

proc proc proc proc sqlsqlsqlsql;;;;

connect to DBMS-name (connection statements);

select column list

from connection to DBMS-name (

DBMS-query )

disconnect from DBMS-name;

quit;

quit;quit;

quit;

DBMS specific query

(5)

SQL Example 7 - Pass-through

procproc procproc sqlsqlsqlsql;

connect to oracle (user=? path=? password=?);

select

*

from connection to oracle (

select

extract(year from dondate) as year, count(*) as count

from cblood2.donation

group by extract(year from dondate) );

disconnect from oracle;

quitquit quitquit;

The Oracle version of the year() function

~30 seconds

Pass-through galore!

proc proc proc proc sqlsqlsqlsql;;;;

connect to oracle (user=? path=? password=?);

select

*

from connection to oracle (

select a.country, b.sex,

trunc(months_between(a.transdate,b.birthdate)/12) as age, count(*) as count

from cblood2.transfusion a inner join cblood2.persons b on a.idnr=b.idnr and b.birthdate <= a.transdate where extract(year from a.transdate) between 1968 and 2002

and instr(coalesce(b.flag,' '),'ID')=0 group by

a.country, b.sex,

trunc(months_between(a.transdate,b.birthdate)/12) );

quitquit quitquit;

Returns age in exact years

Only individuals with a valid ID…

SAS SAS SAS SAS/Proc SQL

Considerations for Pass-through The SAS implementation of Proc SQL is

anything but standard, so what works with SAS may not work with Oracle…

Oracle has a very odd way of handling missing values

SAS SAS SAS SAS/Proc SQL

Conclusions…

Proc SQL in itself is a very powerful tool

…coupled with a powerful database server, Proc SQL can save you a lot of time

References

Related documents

Indicate by Budget Line Item and the Proposed Activity (based on the information contained in Section 2 PHDEP Plan Budget and Goals), the % of funds that will be expended (at least

• If the merchant continues to support a magnetic stripe only POS solution, they will begin bearing the cost of counterfeit chargebacks if the magnetic stripe on a chip card

It does not articulate the need to make for the school and the teacher the most critical answerability to the children, their parents and the community.. It does not

Inline Table-Value User-Defined Function An Inline Table-Value user-defined function returns a table data type and is an exceptional alternative to a view as the user-defined

Increasing light intensity as you transition into the vegetative and reproductive growth phases will increase the rate of photosynthesis, which will provide the plant with more

For instance, if the kernel is single-threaded, then any user- level performing a blocking system call will cause the entire process to block, even if

Metrics to describe heating patterns are proposed along with methods for calculating them from measured room temperatures. The patterns of heating in 249 dwellings in Leicester, UK

The project sought to produce embedded and engaged scholarship (see Chatterton, 2008; the Autonomous Geographies Collective, 2010) driven by the urban social injustice