Using SAS ACCESS to retrieve and store
data in relational database management
systems
Daniel Ricard
Department of Biology Dalhousie University
Background Examples
Disclaimer
SAS is the only proprietary software that I use I use SAS to fit models, and for nothing else I use SAS for Unix running on a variety of Unices
My computing toolkit
Linux OS (Ubuntu and CentOS) PostgreSQL with postGIS unixODBC
The R software environment for statistics and graphics The Generic Mapping Tools (GMT)
The usual suspects (Emacs, Perl, shell scripts)
Background Examples
What I do
Fisheries research
Models of fish population dynamics to determine past, present and future abundance and biomass.
The catch rate is used to estimate abundance and biomass.
Counting fish in the ocean is like counting trees in the forest, except that you can’t see them and they move.
Background Examples
What I do
Fisheries research
Models of fish population dynamics to determine past, present and future abundance and biomass.
The catch rate is used to estimate abundance and biomass.
Counting fish in the ocean is like counting trees in the forest, except that you can’t see them and they move.
Background Examples
What I do
Fisheries research
Models of fish population dynamics to determine past, present and future abundance and biomass.
The catch rate is used to estimate abundance and biomass.
Counting fish in the ocean is like counting trees in the forest, except that you can’t see them and they move.
Background Examples
Relational Database Management System (RDBMS)
What?
Structured Query Language (SQL) Create tables and load data Access data through SQL queries Oracle, DB2, MS Access, PostgreSQL, mySQL, ...
Why?
Indexing of data and optimisation of queries to perform retrievals
Provides a “neutral” layer that can be used by most analytical software through database bindings
OpenGIS ”Simple Features Specification for SQL” - to define geospatial data and perform queries with spatial operators
Background Examples
Relational Database Management System (RDBMS)
What?
Structured Query Language (SQL) Create tables and load data Access data through SQL queries Oracle, DB2, MS Access, PostgreSQL, mySQL, ...
Why?
Indexing of data and optimisation of queries to perform retrievals
Provides a “neutral” layer that can be used by most analytical software through database bindings
OpenGIS ”Simple Features Specification for SQL” - to define geospatial data and perform queries with spatial operators
Background Examples
SAS ACCESS
Database connectivity to Oracle, DB2 and ODBC PROC SQL with pass-through facility
PostGIS runs on PostgreSQL
No support for PostgreSQL or mySQL
However, Open Source RDBMS all come with ODBC drivers
SAS/ACCESS(R) 9.1.3 for Relational Databases: Reference, Second Edition
Background Examples
ODBC
Open Database Connectivity (ODBC) provides a standard software API method for using databasemanagement systems
Background Examples
Getting data from a RDBMS into SAS
Assuming that you have an ODBC Data Source Name (DSN) called “DSNentry”, the following code will allow you to define the RDBMS as a SAS libref.
libname myRDBMSlib odbc USER=’username’
DATASRC= ’DSNentry’ SCHEMA=’public’;
Background Examples
Getting data from a RDBMS into SAS (cont.)
Assuming that you have an ODBC DSN called “DSNentry”, the following code using PROC SQL pass-through facility will allow you to get data directly from the RDBMS.
proc sql;
connect to odbc (user=username datasrc=DSNentry); create table mydata as
select * from connection to odbc (
SQL QUERY GOES HERE );
disconnect from odbc; quit;
Background Examples
Moving data out of SAS and into a RDBMS
libname gfsDB odbc USER=’postgres’ DATASRC= ’gfsDB’ SCHEMA=’public’; PROC GENMOD data=mydata;
ods output ParameterEstimates = myestimates; ...
RUN;
PROC SQL;
CREATE TABLE gfsDB.isoest AS
SELECT * FROM myestimates; QUIT;
Background Examples
Pitfalls or why I had to call the SAS helpline
PROC SQL;
CREATE TABLE gfsDB.isoest(dbtype= (Label=’varchar(30)’ Estimate=’numeric(12,5)’ StdErr=’numeric(12,5)’ LowerCL=’numeric(12,5)’ UpperCL=’numeric(12,5)’ ChiSq=’numeric(12,5)’ ProbChiSq=’numeric(12,5)’) ) AS
SELECT * FROM myestimates; QUIT;
Background Examples
Doing the same in R
Using the RODBC package, the same can be accomplished in R.
# R code
require(RODBC)
chan <- odbcConnect(dsn="gfsDB", uid = "postgres", case=’postgresql’,believeNRows=FALSE) qu <- paste("SQL QUERY GOES HERE ")
mydata <- sqlQuery(chan, qu, errors= TRUE)
Summary
RDBMS are unmatched for storing and retrieving data, spreadsheets can only go so far
SAS, like all other statistical software, provides tools to access a variety of RDBMS and to invoke SQL queries to obtain data Once a query is developed, it can be used from a variety of software, hence facilitating collaborations with other scientists RDBMS can be configured to be accessible on the Internet, hence a powerful tool for sharing data
Summary
RDBMS are unmatched for storing and retrieving data, spreadsheets can only go so far
SAS, like all other statistical software, provides tools to access a variety of RDBMS and to invoke SQL queries to obtain data Once a query is developed, it can be used from a variety of software, hence facilitating collaborations with other scientists RDBMS can be configured to be accessible on the Internet, hence a powerful tool for sharing data
Summary
RDBMS are unmatched for storing and retrieving data, spreadsheets can only go so far
SAS, like all other statistical software, provides tools to access a variety of RDBMS and to invoke SQL queries to obtain data Once a query is developed, it can be used from a variety of software, hence facilitating collaborations with other scientists RDBMS can be configured to be accessible on the Internet, hence a powerful tool for sharing data
Summary
RDBMS are unmatched for storing and retrieving data, spreadsheets can only go so far
SAS, like all other statistical software, provides tools to access a variety of RDBMS and to invoke SQL queries to obtain data Once a query is developed, it can be used from a variety of software, hence facilitating collaborations with other scientists RDBMS can be configured to be accessible on the Internet, hence a powerful tool for sharing data