Getting Your Data into
SAS
Stephanie R. Thompson
Where is your data
stored?
Oracle tables
SQL Server tables
Microsoft Access
Microsoft Excel
Text file
SPSS
What’s the solution?
SAS/ACCESS Interface!
What can you access?
SAS/ACCESS validation matrix
http://support.sas.com/matrix
Access to Relational Databases
and Data Warehouse
Appliances
Aster Data nCluster DB2 Greenplum Hadoop HP Neoview Informix Microsoft SQL Server MySQL Netezza ODBC OLE DB Oracle (including Sun
Oracle Database Machine [Exadata])
Sybase Sybase IQ Teradata
Access to nonrelational
databases
ADABAS
CA-DATACOM/DB
SAS/Access DATA Step Interface to CA-IDMS IMS
PC Files SYSTEM 2000
Operating System Dependent
How do I know what I
have licensed?
proc setinit
noalias
;
run;
Ways to Access Your Data
SAS libname engine
PROC SQL
ODBC
PROC IMPORT
– Import Wizard
Libname Refresher
Point to where your data are
Think of as shortcuts or nicknames
Can be 1 to 8 characters
– Start with an underscore or letter
– Cannot start w/a number or other special character
Libname General Form
LIBNAME libref
<engine>
‘physical
path‘;
Libref – you provide
<engine>
is optional
– Refers to type of data to read
Physical path is operating system
dependent
Libname Engine
Connect to relational database tables
– Oracle, DB2, etc.
Connect to other file types
– Excel, SPSS, etc.
SAS treats tables like SAS datasets
Need to define the “engine”
Libname Engines
Based on what is licensed
– Some with Base SAS
– SAS/ACCESS optional engines
Some are only for certain platforms
Release resource when done
Excel Example
libname fac 'R:\Employee
Data\Employee Database.xls'
access=readonly ;
data faculty_all(drop = months) ;
set fac.
'No Costing$'n
;
[ add’l SAS statements ]
;
Accessing Databases
Databases on
– PC – Servers
Use of ID and passwords
Additional libname options
Non-SAS Oracle Tips
Check your IP address
– Static or dynamic?
– Determine your IP address:
go to a DOS prompt : Start – run – CMD Type ipconfig/all
IP address listed after “IP Address”
– Obtain a static IP address for your PC.
This will be necessary if your IT department opens the firewall only to specific IP addresses.
tnsnames.ora
General form (from Oracle website)
net_service_name=(DESCRIPTION=
(ADDRESS=(protocol_address_information)) (CONNECT_DATA=
(SERVICE_NAME=service_name))) Can have multiple services in one file
More Non-SAS Oracle Tips
Obtain any security clearances that you
may need.
– Security may be by instance, schema, table, table elements
Install the Oracle client on your PC
Configure the Oracle client to access your
data
Protect your ID and
Password!
libname st 'C:\My Documents\My SAS Files'; data cred;
set st.cred;
call symput('stid', stid); call symput('stpwd', stpwd); run;
Oracle Using Libname
libname mylib
oracle
user=&stid
pw=&stpwd
path=RPTS.SPECTRUMREPORTS
schema=saturn;
“oracle” is the engine – how the data
are stored
Accessing Oracle Tables
proc sql;
create table registered as
select distinct sfrstcr_pidm as pidm from mylib.SFRSTCR
where [add’l SQL statements]
order by sfrstcr_pidm; quit;
Just Using PROC SQL
proc sql noerrorstop;
connect to oracle(user=&stid pw=&stpwd path=RPTS.SPECTRUMREPORTS); create table regdetail as
select * from connection to oracle (select pidm, levl_desc, styp_code from sgbstdn a, sfbetrm
where [add’l SQL statements] ); quit;
Why Use SQL?
Takes advantage of SAS SQL
Pass-Through Facility
Understood by database host system
Standardized language
No need for “translation”
Benefits of using PROC
SQL
SQL Pass-Though Facility!
Benefits of using PROC
SQL (cont’d)
The Pass-Through Facility uses SAS/ACCESS to
connect to a DBMS and to send statements directly to the DBMS for execution. This facility is an
alternative to the SAS/ACCESS LIBNAME
statement. It enables you to use the SQL syntax of your DBMS, and it supports any non-ANSI standard SQL that is supported by your DBMS.
(SAS online documentation)
Benefits of using PROC
SQL (cont’d)
SQL code can be processed on
system hosting the database
– Increase speed – Increase efficiency
Other Notes
DATA step code is not passed to
Oracle when using the LIBNAME
engine
PROC PRINT is passed to Oracle as a
SELECT query
PROC FREQ is sent to Oracle as a
select * from table
Other Notes (cont’d)
WHERE criteria will get passed to
Oracle
SAS functions can be used in PROC
SQL
– (except LAG and DIF)
More functionality in SAS 9.2 and
beyond
Using PROC SQL w/ODBC
proc sql;
connect to odbc(dsn=mdsn uid=&usr pwd=&pwd);
create table studs as
select * from connection to odbc (
[add’l SQL statements] ); disconnect from odbc;
quit;
“mdsn” is the name of the ODBC connection set up on my PC
Setting up an ODBC
Connection
Setting up an ODBC
Connection (cont’d)
Setting up an ODBC
Connection (cont’d)
Why use?
– Needed for certain types of databases SQL Server
– To use the libname engine instead of PROC IMPORT
Import File Types
Base PROC IMPORT
– text files .dat
tab delimited comma separated delimited
Import File Types
SAS/ACCESS to PC Files
– JMP – Excel – MS Access – dBase – LotusPROC IMPORT
Allows you to import many types of PC
files
Add code to your program
Use the Wizard
Use the Wizard to create code for you
PROC IMPORT - Excel
PROC IMPORT OUT= WORK.mydata DATAFILE= "H:\Stephanie\Excel\Student_Data.xls" DBMS=EXCEL REPLACE; SHEET="Sheet1$"; GETNAMES=YES; MIXED=NO; SCANTEXT=YES; USEDATE=YES; SCANTIME=YES;
Import Wizard
Log Window
1 PROC IMPORT OUT= WORK.mydata
2 DATAFILE= "H:\Stephanie\Excel\Student_Data.xls" 3 DBMS=EXCEL REPLACE; 4 SHEET="Sheet1$"; 5 GETNAMES=YES; 6 MIXED=NO; 7 SCANTEXT=YES; 8 USEDATE=YES; 9 SCANTIME=YES; 10 RUN;
NOTE: WORK.MYDATA was successfully created. NOTE: PROCEDURE IMPORT used (Total process time): real time 2.37 seconds
Flat Files
Delimited
– Import Wizard – Data Step
Flat File (non-delimited)
– Data step
Know your data!
References
Absolute
– Preceded by the @ symbol
– Indicate a specific column to begin reading in data
– Sample line
input @1 week 2. @5 home $2.;
References (cont’d)
Relative
– Relative to current location – Can be positive or negative – Sample line
Tips
Know Thy Data!
Understand record layout for flat files
Libname engine vs. import
– Depends on what you need to do – What you have licensed
Tips (cont’d)
Database Views
– Contain pre-combined sets of information – Can cover many different base tables – May include pre-calculated variables – Can be resource intensive
Tips (cont’d)
Tables
– Learn table structures One record per key Multiple records per key – Use validation tables
– Know keys needed to join tables Single?
Many?
Tips (cont’d)
Data step vs. PROC SQL
– Both will work
– Data step runs much more slowly System dependent
Try both and see which is faster – Data step only w / libname engine – No SQL pass-through w / data step
Enterprise Guide