SAS PROGRAMMING ESSENTIALS FOR STATISTICAL COMPUTING IN DAIRY RESEARCH

(1)

COMPUTING IN DAIRY RESEARCH

A.K. Sharma NDRI, Karnal-132 001

aksharma@ndri.res.in 1. Introduction

Statistical Computing is the interface between Statistics and Computer Science. It is the area of computational science (or scientific computing) specific to the mathematical science of Statistics.

The rapid and sustained increase in computing power starting from the second half of the 20^th century has made a substantial impact on the practice of Statistical Science. Early statistical models were almost always from the class of linear models, but powerful computers, coupled with suitable numerical algorithms, caused an increased interest in nonlinear models (such as connectionist models) as well as the creation of new types, such as generalised linear models and multivariate and multilevel models. Increased computing power has also led to the growing popularity of computationally intensive methods. The computer revolution has implications for the future of Statistics with new emphasis on “Experimental” and “Empirical” Statistics. A large number of both general and special purpose statistical software are now available.

SAS (pronounced “sass”, originally Statistical Analysis System) is such an integrated system of software products provided by SAS Institute Inc., Cary, North Carolina, USA that enables Scientists to perform various tasks, viz.,

 Data Entry, Retrieval, Management and Mining

 Report Writing and Graphics

 Statistical Analysis

 Forecasting and Decision Support

 Operations Research and Project Management

 Quality Improvement

 Data Warehousing

 Platform Independent and Remote Computing

 Grid and Cloud Computing, etc.

SAS was conceived by Anthony J. Barr in 1966. As a North Carolina State University graduate student during 1962-64, Barr had created an Analysis of Variance Modelling Language inspired by the notation of statistician Sir Maurice G. Kendall, followed by a Multiple Regression Programme that generated machine code for performing algebraic transformations of the raw data. Drawing on those programmes and his experience with structured data files, he created SAS, placing statistical procedures into a formatted file framework. From 1966 to 1968, Barr developed the fundamental structure and language of SAS. By 1971, SAS was gaining popularity within the academic community. Strength of the system was analysing experiments with missing data, which was useful especially to the agricultural and pharmaceutical industries among others.

In 1973, extensive programming in econometrics, time series, and matrix algebra were made. In 1976, SAS Institute, Inc. was incorporated by Barr, et al.

(2)

Base SAS – An Overview

Base SAS is the core foundation for variety of data management and analytic software components under SAS. It provides essential tools for the basic data-driven tasks that are performed by the scientists, i.e., data access, management, analysis and presentation (Fig.1).

Fig.1: Base SAS.

The SAS programming language, ready-to-use procedures and the windowing interface can be used to access research data from many sources, manage, analyse and present the research data as meaningful information in the form of a report that can be delivered to any platform, in any format as per need. The reports thus generated can be published on the Web for its wider dissemination among various stakeholders.

Accessing data

As stated earlier, you can access data using SAS regardless of the data sources or platforms on which it resides, i.e., you can access data that are stored (almost) anywhere, whether these are in a file on your system, or data that is stored on a remote server or in another database system in almost any format, including raw data, SAS data sets, and files created by other vendors’

software. You can access various types of file with SAS that include, reading raw data in any format from any kind of file including variable-length records, binary files, free-formatted data, even files with missing data; accessing some other vendors’ files directly, including BMDP, SPSS, and OSIRIS files. For others, you can use SAS/ACCESS tool to access external data as if it were native to SAS, e.g., reading data stored in a Microsoft Excel spreadsheet, in a Microsoft Access table, in a dBase file, or in ORACLE or another DBMS.

Managing data

After having accessed the data, the same can be manipulated using the SAS programming language, e.g., format the data, create variables (columns), use operators to evaluate data values,

Data Access and Management

Application Development

Reporting and

Graphics Analytics

base SAS

(3)

use functions to create and recode data values, subset data, perform conditional processing, merge a wide range of data sources, create, retrieve, and update database information.

Analysing data and presenting information

Once the data is in shape, you can use SAS to analyse the data and produce reports. The resultant SAS output can range from a simple listing of a data set to customised reports of complex relationships. Base SAS provides powerful data analysis tools to produce tables, frequency counts, and cross-tabulation tables; create a variety of charts and plots; compute a variety of descriptive statistics, including the mean, sum, variance, standard deviation and more; compute correlations and other measures of association, as well as multi-way cross-tabulations and inferential statistics, etc.

For reporting and displaying analytical results, SAS provides several visually appealing output formats, such as an array of mark-up languages including HTML4 and XML; output that is formatted for a high-resolution printer, PostScript, PDF, and PCL files; RTF; interactive coloured graphs using ActiveX controls or Java applets. Finally, these reports can be delivered to a wide variety of locations and platforms in order to suit your needs.

SAS Workspace

SAS is designed to be easy to use. It provides windows for accomplishing all the basic SAS tasks you need to do. Once you get familiar with the starting points for your SAS tasks, you are ready to accomplish any task that SAS can do. SAS workspace is organised in the following five main windows:

The Explorer Window

In the Explorer Window, you can view and manage your SAS files and create shortcuts to files that are not formatted by SAS. You can use this window to:

 View and manage SAS files and create shortcuts to files that are not formatted by SAS

 Create new SAS libraries and SAS files

 Open any SAS file

 Perform most file management tasks such as moving, copying, and deleting files.

Program Editor or Editor Window

You can use either of these windows to enter, edit, and submit SAS programs. The Editor Window provides several useful editing features such as Colour Coding and Syntax Checking of SAS language; Expandable and Collapsible sections; Multilevel Undo and Redo, etc. The initial Editor Window title is Editor - Untitled. When you open a file or save the contents of the Editor Window to a file, the window title changes to reflect that file name. When the contents of the Editor Window are modified, an asterisk is added to the title. You can have multiple Editor Windows open at the same time.

Log Window

The Log Window displays messages about your SAS session and any SAS programs that you submit.

(4)

Output Window

The Output Window displays the output from SAS programs that you submit. It automatically opens as soon as an output is created. In the MS-Windows operating environment, the Output Window is positioned behind the Log and Editor Windows until there is output to display. You can navigate between windows using the taskbar. Several SAS programs may not create any output in the Output Window. Some programs open interactive windows while other programs only produce messages in the Log Window. If you create HTML output, you can view it in the Results Viewer Window, which is the internal browser for SAS.

Results Window

The Results Window helps us to navigate and manage output from SAS programs that you submit. You can view, save and print individual items of the output. Recall that the Results Viewer Window is the place where you actually view HTML output. The Results Window is empty until you submit a SAS program that creates output. Then it opens automatically. The Results Window is positioned in front of the Explorer Window when SAS creates output. You can move between the two windows by using the tabs at the bottom of the windows.

SAS data sets

Before you start working with the data in SAS, these data must be in a special form called a SAS data set. Understanding SAS data sets is the first and foremost step towards learning about SAS programming. Conceptually, a SAS data set (also called a table) is a file containing descriptor information and related data values. The file is organised as a table of observations (rows), also called cases and variables (columns) that SAS can process. Some SAS data sets also contain an index, which enables SAS to locate records in the data set. In some special cases, such as using SAS/ACCESS to read database management system files directly, the SAS data set may contain only the logic for accessing the data, not the data itself! However, for this presentation, you assume that SAS data sets contain data. To work with SAS data sets, it is also important to understand as to how are they stored? Note that all SAS files are stored in a SAS library, which is a collection of files such as SAS data sets and catalogues. In the MS-Windows operating environment, a SAS library is typically a group of SAS files in the same folder. To access a library, assign it a name (also known as a libref, or library reference). The library names are nicknames or shortcuts that you can use to identify libraries during a SAS session. In this presentation, you will learn about assigning SAS libraries, and then to work with SAS data sets in a library. SAS assigns three libraries, viz., Sashelp, Sasuser and Work, automatically each time you start SAS. In this presentation, you learn about these libraries and view file types in the Sashelp library. In the Explorer Window, double-click Libraries. Notice that there are three libraries. Sashelp is a permanent library that contains sample data and other files, which control how SAS works at your site; this is a read-only library. Sasuser is also a permanent library that contains SAS files in the Profile Catalogue that store your personal settings; this is also a convenient place to store your own files. Work is a temporary library for files that do not need to be saved from session to session. Note that Sashelp, Sasuser and the user-assigned libraries are permanent libraries. Files in permanent libraries are saved in the operating environment. Work is a temporary library; i.e., files in the Work library are not saved once you end the SAS session.

Double-click the Sashelp library. Scroll the Explorer Window and notice that there are several types of files, or members, in the library.

(5)

Assign a library

When you define a library, you indicate the location of your SAS files to SAS. Once you define a library, you can manage SAS files within it. In this presentation, you use the New Library Window to assign a library. On the toolbar, click the New Library tool; the New Library Window opens. In the Name box, type Aksas (or any name you like). Library names are limited to 8 characters, which must start with a letter or underscore. The library names can contain only letters, numerals, or underscores. Select the Enable at Startup check box; this library will be automatically assigned each time you start a SAS session. Click Browse; select the default location or select another location in your operating environment. Any files that you save to the Aksas library will be saved in the folder that you designate in the Path box; click OK. You can delete SAS libraries. Note that when you delete a SAS library, SAS no longer has access to the directory. However, the contents of the library still exist in your operating environment.

Import different format data files as SAS data sets and rename the data set files Example 1

Consider the following MS-Excel worksheet (lact.xlsx) containing data pertaining to milk production (state wise, up to first four lactations) for different breeds of dairy cattle. The same worksheet file can be saved as ASCII text file (say, lact.txt) by selecting appropriate file type at the time of saving the worksheet.

Let us see how to add and rename a SAS data set. Now that you have a library, you can add a SAS data set to the library. With the Explorer Window active, select View ► Show Tree; the libraries are displayed in the left pane of the window. Click the Aksas library. Import lact.xlsx (or lact.txt) file to SAS using File ► Import Data... option. Click Aksas; notice that lact file has been imported there. Right-click lact and select Rename. Type lactation for the new name and click OK. SAS data set names must be 1 to 32 characters in length; begin with a letter (A-Z,

STATE LACT BREED MY

Haryana 1 Hariana 18

Haryana 1 Tharparkar 23 Haryana 2 Tharparkar 25 Haryana 3 Tharparkar 22 Haryana 4 Tharparkar 21

Haryana 1 Sahiwal 22

Haryana 1 Red Sindhi 27 Haryana 2 Red Sindhi 26 Haryana 3 Red Sindhi 25 Haryana 4 Red Sindhi 25

STATE LACT BREED MY

Panjab 1 Hariana 25

Panjab 2 Hariana 36

Panjab 3 Hariana 35

Panjab 4 Hariana 37

Panjab 1 Tharparkar 30 Panjab 2 Tharparkar 32 Panjab 3 Tharparkar 29 Panjab 4 Tharparkar 30

Panjab 1 Sahiwal 26

Panjab 2 Sahiwal 29

Panjab 3 Sahiwal 30

Panjab 4 Sahiwal 38

Panjab 1 Red Sindhi 29 Panjab 2 Red Sindhi 25 Panjab 3 Red Sindhi 27 Panjab 4 Red Sindhi 28

(6)

including mixed case characters) or an underscore ( _ ); and continue with any combination of numbers, letters, or underscores.

Open a SAS data set

Now that you’ve created a data set, view the data that it contains. There are many other ways to get your data into a SAS data set. In general, you can enter data directly into a SAS data set using the VIEWTABLE Window; read raw data into a SAS data set using the Import wizard or SAS programming statements; read and modify existing data; convert other vendors’ data files into SAS data sets using SAS/ACCESS; read other vendor’s data directly using SAS/ACCESS. In the Explorer Window, double-click the lactation table in the Aksas library; the table opens in the VIEWTABLE Window; scroll the VIEWTABLE Window and notice that there are 32 rows (also called observations or cases, both terms are used here synonymously) and 4 columns (also called variables) sets using SAS programming statements.

View general data properties

The descriptor portion of a SAS data set contains information about the data set, including the name of the data set; the date and time the data set was created; the number of observations; the number of variables; you can see this information by viewing the general properties of a data set.

In the Explorer Window, right-click the Lactation table and select Properties. In the General tab, view the data set’s properties. (Don’t close this window yet as you’ll need it for the next step too).

View column attributes

Besides general information about the data set, the descriptor portion also contains attribute information for each variable in the data set including the variable’s name, type, length, format, informat (input format), and label. In the Aksas.lactation Properties Window, click the Columns tab. In the Column Name column, notice that all the variables for the data set are listed along with a symbol that indicates the variable’s type.

Variables and Labels

Variable (column) names must be 1 to 32 characters in length; begin with a letter (A-Z, including mixed case characters) or an underscore ( _ ); continue with any combination of numbers, letters, or underscores. SAS is case insensitive for variable names; however, SAS remembers the first occurrence of the variable and writes it that way in output. Now, look at the Label column, a label is descriptive text up to 256 characters; labels are used instead of variable names in some reports and for the column headings in the VIEWTABLE Window.

Types of Variables

Further, look at the Type column. There are only two types of variables in SAS, viz., character and numeric. Character variables are listed as Text in the type column, and numeric variables are listed as Number in the Type column. The Length attribute is related to the variable’s type.

Character variables can contain any values; use a blank to represent missing values; can be up to 32k long. Numeric variables can contain only numeric values (the digits 0 through 9, +, -, ., and E for scientific notation); use a single period (.) to represent missing values; have a default length of 8; numeric values (no matter how many digits they contain) are stored as floating point numbers in 8 bytes of storage, unless you specify another length.

(7)

Change a variable’s format

Formats are variable attributes that affect the way data values are written. SAS offers a variety of formats for numeric and character data. In this session you change the format of a variable. In the Aksas.lactation Properties Window, look at the Format column; the MY variable uses the SAS default format, which displays the value 18.00 as 18 in a report. In general, SAS formats have a name; a w value, which specifies the width that is used for displaying the value, a period following the w value, numeric SAS formats, such as the Fw.d format, can also specify a d value, which is the number of decimal places to be displayed. Open the lactation table if it is not already open; right-click the MY column heading and select Column Attributes from the shortcut menu (Fig.2); in the Column Attributes Window, click the selector next to the Format box.

Fig.2: Viewing Column Attributes of Aksas.lactation data set.

In the Format Window, view the available formats and their descriptions. When you select a format in this list, an example is displayed in the Format Details area. Let’s say that you want to incorporate the decimal places for the values of the MY variable. Change the value in the Decimal box to 12.2; click OK. In the Column Attributes Window, click Apply; then click Close. View the lactation table and notice that the values in the MY column now contain decimal places up to two decimal points.

(8)

Fig.3: Change a variable’s format.

1. SAS programming essentials

The SAS programming language is both powerful and flexible. You can program any number of analyses and reports with it. The SAS language contains statements, expressions, functions and CALL routines, options, formats, and informats, i.e., elements that many programming languages share. SAS can also simplify programming for you with its library of built-in programs known as SAS procedures.

Create and run SAS programs

SAS procedures use data values from SAS data sets to produce pre-programmed reports, requiring minimal effort from users. Base SAS procedures enable you to manipulate data; store and retrieve information; perform statistical analysis; create reports. This session shows you the main characteristics of SAS programs, i.e., components of SAS programs. Let’s begin by looking at a simple SAS program. The sample SAS program below (Fig.4) contains two steps, a DATA step and a PROC step. These two types of steps, alone or combined, form all SAS programs:

Fig.4: A sample SAS program showing the DATA and PROC steps.

(9)

Components of a SAS program

DATA step typically creates or modifies SAS data set. Also, it is used to produce customised reports. For example, you can use DATA step to:

 put your data into a SAS data set

 compute the values for new variables

 check for and correct errors in your data

 produce new SAS data sets by subsetting, merging, and updating existing data sets, etc.

Note that, in the sample program (Fig.4), the DATA step creates a new SAS data set, lact_pp.sas7bdat in the Aksas library using the existing SAS data set lactation.sas7bdat in the same library. Note that we do not directly use the original data set (lactation, in this case) rather a new data file (lact_pp, in this case) is created from the original data set, which is actually utilised by SAS for analysis of the underlying data.

PROC (procedure) steps typically analyse and process data presented in the form of a SAS data set; and they sometimes create SAS data sets that contain the results of the procedure. PROC steps control a library of pre-written routines (procedures) that perform tasks on SAS data sets, such as listing, sorting, and summarising data, e.g., you can use PROC steps to print a report;

produce descriptive statistics; create a tabular report; produce plots and charts, etc. The sample program (Fig.4) uses the PROC PRINT procedure, which prints the data in a data set. The DATA= option tells SAS what data to use for the procedure.

Characteristics of SAS Programs

Let’s have a glance at the individual statements in the sample program (Fig.4). SAS programs comprise of several SAS statements. A SAS statement has two important characteristics, i.e., it usually begins with a SAS keyword; and it always ends with a semicolon. A DATA step begins with the keyword DATA, and a PROC step begins with the keyword PROC. Generally, a step ends with a RUN statement or when a new DATA or PROC step begins. A RUN statement tells SAS to process all the preceding lines of the step. SAS statements are free-format, i.e., they can begin and end anywhere on a line; one statement can continue over several lines; and several statements can be on a single line. SAS statements are not case sensitive. Blanks or special characters separate the ‘words’ in a SAS statement.

Composing a SAS program into the Editor Window

Let’s compose the sample SAS programs [i.e., Code 1(a) and Code 1(b)] given below. These programs create a new SAS data set from an existing SAS data set and then print a listing of the new data set. The Aksas.lactation data set was created in Aksas with SAS data sets. If you didn’t create it, return to the task to define the library and create the data set before proceeding here. Type the following codes (one-by-one) separately, into the Editor Window and save:

(10)

Code 1(a): One-level name. Code 1(b): Two-level name.

LIBNAME aksas ‘d:\aksas’;

DATA lact_1a;

SET aksas.lactation;

RUN;

PROC PRINT DATA=lact_1a;

RUN;

DATA aksas.lact_1b;

RUN;

PROC PRINT DATA=aksas.lact_1b;

RUN;

Notice that when you write this text into the Editor Window, SAS keywords are colour-coded and the program steps are separated into expandable blocks. The DATA step in Code 1(a) creates a new SAS data set named lact_1a. Note that lact_1a is a one-level name. Since no library name is specified for the lact_1a data set, SAS uses by default the Work library to temporarily store this file. The lact_1a data set is created from the lactation data set that you stored in your Aksas library. Notice that Aksas.lactation and Aksas.lact_1b are two-level names [as shown in the Code 1(a) and Code 1(b)].The PROC PRINT step lists all the data in the Work.lact_1a data set. If the user-defined library is assigned as in case of the Code 1(b), SAS uses the user-defined library (Aksas in present case) rather than the Work library for one-level names.

Submit the program and view output

When you submit a SAS program, SAS compiles (checks the code for its grammatical correctness in SAS perspective) and executes (if compilation process is successful) the code and returns expected results (provided the code is logically correct) to the Output Window.

1. With the Editor Window active, select Run ►Submit. To submit only a portion of a program in the Editor Window, highlight the portion you want to submit, right-click the highlighted area, and select Submit ► Selection.

2. The Output Window comes to the front (Fig.5). Scroll the window and notice that the PRINT procedure has printed all the data from the lactation data set. In addition, the procedure has automatically formatted the data into rows and columns.

Fig.5: Results of the Code 1(a) SAS program being displayed in the OUTPUT Window.

(11)

View the log

Each time a step is executed, SAS generates a log of the processing activities and the results of the processing. The SAS log collects messages about the processing of SAS programs and any errors that may occur. Click the Log Window to activate it. The contents of the Log Window produced by SAS for the Code 1(a) program are shown in Fig.6. Notice that there are separate messages for the DATA and PROC steps. The messages from the DATA step tell you that SAS read 32 observations from the Aksas.lactation data set and that there are 32 observations and 4 variables in the new data set Work.lact_1a. The messages from the PROC step tell you that 32 observations were read during the PRINT procedure. There are no error messages.

Fig.6: The contents of the Log Window produced by SAS while executing the Code 1(a) program.

It’s important that you understand how SAS processes DATA steps (one observation at a time) and PROC steps. You should also learn methods and tips for debugging SAS programs. For details on these topics, refer to the SAS documentation.

Use the Results Window

Once you have created output for Code 1(a), the Results Window opens. On windows, the Results Window covers the Explorer Window. You can switch between the two windows using the tabs at the bottom of the window.

1. Look at the Results Window. The Results node at the top of the window contains a folder for the output of the PROC PRINT procedure (Fig.7). The folder is labeled with the procedure name and the title (default in present example) of the report.

2. Click the expansion icon next to the folder.

3. Double-click Data Set Work.lact_1a. The Output Window (Fig.5) scrolls to the top of the output.

(12)

Fig.7: The contents of the Results Window produced by SAS while executing the Code 1(a) program.

Note that each time you run a program that creates output; the new output is appended to the last output in the Output Window. You can double-click the output item in the Results Window to scroll the Output Window to the top of that particular output. If you clear the Output Window, the links in the Results Window are deleted.

Create HTML output

You’ve seen that you can create SAS listing output in the Output Window, but you can also set preferences to create HTML output (Fig.8). Your preferences are stored from session to session.

1. Select Tools ► Options ► Preferences.

2. Click the Results tab.

3. Select the Create HTML check box and click OK.

(13)

Fig.8: Setting preferences to create HTML output.

Modify the program

Now that you have seen the output and formatting for the PRINT procedure, let’s try adding options to the program to incorporate the following changes:

 Remove the date from the top of each page, which SAS prints by default

 Replace the default “The SAS System” title with

 Use the column label rather than the column name in the output

 Remove the Obs column.

We can do this by using a SAS system option, a TITLE statement, and options on the PROC PRINT statement. To remove the date from the top of each page, use the NODATE system option. Insert the following OPTIONS statement at the top of the program: options nodate; to replace the default title, use the TITLE statement. Insert the line, title “State, Lactation and Breed wise Milk Yield Report”; above the RUN statement for the PROC PRINT procedure.

By including the TITLE statement in the PROC step, the title is applied to the output of the procedure. To remove the Obs column and use column labels, use options on the PROC PRINT statement. Change the PROC PRINT statement to add the LABEL and NOOBS options. Your PROC PRINT statement should look like this:

proc print data=lact_1b label noobs. The revised program should look like as shown below.

OPTIONS NODATE;

DATA aksas.lact_1b;

RUN;

PROC PRINT DATA=aksas.lact_1b label noobs;

TITLE "State, Lactation and Breed wise Milk Yield Report";

(14)

RUN;

OPTIONS;

TITLE;

Select Run ► Submit the above program; it produces the following revised output (compare it with that given in Fig.5):

Fig.9: Revised output without system date as well as having user-defined title replacing the default SAS title on top of the report.

With the Results Viewer Window active, select File ► Save As. In the File name box, type e:\aksas\Lactation.htm. Click Save. Look at the Results Window. Notice that there is a new folder for the results of the program you just ran. The folder name uses the State, Lactation and Breed wise Milk Yield Report title that you specified in your program. Double-click the Print:

State, Lactation and Breed wise Milk Yield Report folder, then double-click Data Set Aksas.lact_1b. There are links to both the listing output and the HTML output (Fig.10).

(15)

Fig.10: Revised Results with links to simple output and HTML formats.

Now you’ve learned the basic steps for creating, submitting (running), and saving SAS programs.

Creating simple data sets and manipulating the data with SAS program

Earlier, we imported data from an already existing data file in different formats such as MS- Excel worksheet or ASCII formats to build up SAS data set. The following example program illustrates as to how to create a SAS data set directly through programming.

Example 2: Let us create directly a SAS data set for the data given in Example-1 above.

We use the following program (a revised version of code 1(b) above). The data set is saved as aksas.lactation_.sas7bdat. Explanation of the code is given at its end.

OPTIONS NODATE;

LIBNAME aksas 'd:\aksas';

DATA aksas.lactation_;

INPUT state $ lact $ breed $ MY;

(16)

CARDS;

Haryana 1 Hariana 18 Haryana 2 Hariana 19 Haryana 3 Hariana 21 Haryana 4 Hariana 20

Haryana 1 Tharparkar 23

Haryana 1 Sahiwal 22 Haryana 2 Sahiwal 24 Haryana 3 Sahiwal 23 Haryana 4 Sahiwal 23

Haryana 1 Red Sindhi 27

Panjab 1 Hariana 25 Panjab 2 Hariana 36 Panjab 3 Hariana 35 Panjab 4 Hariana 37

Panjab 1 Tharparkar 30 Panjab 2 Tharparkar 32 Panjab 3 Tharparkar 29 Panjab 4 Tharparkar 30 Panjab 1 Sahiwal 26

Panjab 2 Sahiwal 29 Panjab 3 Sahiwal 30 Panjab 4 Sahiwal 38

Panjab 1 Red Sindhi 29 Panjab 2 Red Sindhi 25 Panjab 3 Red Sindhi 27 Panjab 4 Red Sindhi 28

;

PROC PRINT DATA = aksas.lactation_ label noobs;

RUN;

OPTIONS;

TITLE;

Program Explanation (Example 2):

As stated earlier, this program creates a data set, lactation_.sas7bdat under Aksas library. The above program needs some explanation for readers’ comprehension. The ‘input’ statement has been incorporated under the DATA step. This statement creates the variables, state, lact, breed and my denoting State Name, Lactation Number, Breed Name and Milk Yield, respectively.

Note the ‘$’ symbol following the variable names, state, lact and breed each under input statement indicates that these variables are all of type ‘alpha-numeric’ or ‘non-numeric’, or

‘character’, i.e., no statistical/mathematical analysis is employed on these variables, though these variables may be used as grouping variables for classification of the underlying data. The readers

(17)

should note it carefully that the data values assumed by the lact variable seem to be numeric (1- 4) but these values are like only serial numbers, which do not directly participate in the data analysis, e.g., you will never compute average lactation number as it makes no sense. Hence, such variables are better defined as character type variables (to be used as grouping variables) rather than numeric variables, for you will not appreciate if the lactation number is represented as 1.00, 2.00…4.00. The data type of the variable, my (denoting milk yield) is numeric. Further, the statement ‘cards’ used in conjunction with the input statement indicates to the SAS system that each of the rows following cards statement represent one case or record pertaining to the milk production of a particular breed of dairy cow in a specific state, pertaining to a specific lactation number. Several cases can be included here and the end of the data is marked with a semi-colon (;). Note it carefully that the DATA step does not terminate with ‘run’ statement in such a case;

rather it ends with only a semi-colon. Also, note that each variable in input statement is separated with a blank space. Each case is written on a separate line and the data value for each variable on this line is written in the same order as defined in the input statement. This program produces same output as shown in Fig.9.

Example 3: The following table gives the gain in body weight (kg) per heifer during four grazing treatments:

Heifer number Gain in body weight (kg) Treatments

T1 T2 T3 T4

1 67.3 74.2 63.1 48.7

2 36.9 42.2 32.9 49.0

3 63.2 58.6 59.2 62.0

4 26.8 36.6 42.4 38.8

5 54.8 54.6 34.0 48.2

6 64.2 81.8 65.6

7 81.4

Test whether the mean gain in weight of heifers under four treatments is equal or not?

You know that this is a typical example of one-way Analysis of Variance (ANOVA). Therefore, the SAS program to first create the data set and then perform the one-way ANOVA on this data set is as follows:

DATA aksas.bla256;

INPUT TREAT $ GAIN;

CARDS;

T1 67.3 T1 36.9 T1 63.2 T1 26.8 T1 54.8 T1 64.2

(18)

T1 81.4 T2 74.2 T2 42.2 T2 58.6 T2 36.6 T2 54.6 T2 81.8 T3 63.1 T3 32.9 T3 59.2 T3 42.4 T3 34.0 T3 65.6 T4 48.7 T4 49.0 T4 62.0 T4 38.8 T4 48.2 ;

PROC ANOVA DATA=aksas.bla256;

CLASS TREAT;

MODEL GAIN=TREAT;

TITLE 'COMPARE MEAN GAIN IN BODY WEIGHT OF HEIFERS UNDER FOUR TREATMENTS - ONE-WAY ANOVA';

RUN;

QUIT;

Program explanation (Example 3):

Similar to previous Example-2, the DATA step here creates data set containing data on two variables, TREAT and GAIN denoting grazing treatments (grouping variable) and gain in body weight, respectively. Note that the cases are presented sequentially treatment wise, i.e., following the order: T₁, T₂, T₃ and T₄. The SAS system expects you to present the one-way data for ANOVA in this fashion only. Now, further look at the PROC ANOVA step. (Note: the PROC ANOVA is presented here so as to highlight the layout of the data within the data set under DATA step in perspective with ANOVA as the arrangement of data in data set depends upon the intended analysis tool to be employed on the data set). The PROC ANOVA step performs one- way analysis of variance procedure on the underlying data set. PROC ANOVA is relatively easy to use. Let’s look at the code statement-by-statement.

PROC ANOVA DATA = aksas.bla256;

This statement defines that we want to use the ANOVA procedure on the data set called aksas.bla256.sas7bdat, which was created under the DATA step above. Now, consider the next statement in PROC ANOVA step:

CLASS TREAT;

(19)

This defines TREAT (independent variable) as class variable. This is how we specify the associated values to be grouped. Further, consider the next statement:

MODEL GAIN=TREAT;

It tells SAS what model we want to test. This statement signifies that GAIN (on LHS to ‘=’ sign) is the response variable or dependent variable and TREAT (on RHS to ‘=’ sign) is a factor or independent variable. Complete description of any SAS step and statements contained therein can be obtained through inbuilt HELP documentation. This program produces the ANOVA table along with several other results as follows:

COMPARE MEAN GAIN IN BODY WEIGHT OF HEIFERS UNDER FOUR TREATMENTS - ONE-WAY ANOVA

The ANOVA Procedure

Class Level Information Class Levels Values TREAT 4 T1 T2 T3 T4

Number of Observations Read 24 Number of Observations Used 24

The ANOVA Procedure

Dependent Variable: GAIN

Source DF

Sum of

Squares Mean Square F Value Pr > F Model 3 359.889964 119.963321 0.48 0.7022 Error 20 5035.919619 251.795981

Corrected Total 23 5395.809583

R-Square Coeff Var Root MSE GAIN Mean 0.066698 29.60233 15.86808 53.60417

Source DF Anova SS Mean Square F Value Pr > F TREAT 3 359.8899643 119.9633214 0.48 0.7022

Note: Complete details of the analysis and interpretation of the results are discussed elsewhere (by different authors) in this e-course.

(20)

Example 4: The following table gives the fat content for four batches of paneer : Batch

1 2 3 4

Fat Content

50.3 54.1 57.5 55.3 52.2 53.7 56.3 52.8 52.5 52.5 55.8 53.5 51.7 55.2 56.9 51.7

52.6 55.8

53.2

Write a SAS program to create data set to analyse the data for variation among batches with regard to fat content in paneer.

This is the example of one-way ANOVA, which is similar to the previous Example-3. Therefore, the SAS program to create the requisite data set file for the data given in the above table and to print the observations contained in the newly created data file is a follows:

OPTIONS NODATE;

DATA aksas.ex4;

INPUT batch $ fat_cont;

cards;

B1 50.3 B1 52.2 B1 52.5 B1 51.7 B1 52.6 B1 53.2 B2 54.1 B2 53.7 B2 52.5 B2 55.2 B3 57.5 B3 56.3 B3 55.8 B3 56.9 B3 55.8 B4 55.3 B4 52.8 B4 53.5 B4 51.7

;

PROC PRINT DATA = aksas.ex4;

TITLE "Analysis of variance between batches w.r.t. fat content in Paneer";

RUN;

TITLE;

OPTIONS;

QUIT;

(21)

Program output (Example 4)

Analysis of variance between batches w.r.t. fat content in Paneer Obs batch fat_cont

1 B1 50.3 2 B1 52.2 3 B1 52.5 4 B1 51.7 5 B1 52.6 6 B1 53.2 7 B2 54.1 8 B2 53.7 9 B2 52.5 10 B2 55.2 11 B3 57.5 12 B3 56.3 13 B3 55.8 14 B3 56.9 15 B3 55.8 16 B4 55.3 17 B4 52.8 18 B4 53.5 19 B4 51.7

Example 5: Four batches of butter were prepared by churning cream. Three fat levels in cream and three churning speeds were taken for preparation of butter. The moisture content (mc) in butter for the four batches is given below:

Batch

Fat Level

30% 40% 50%

Churning Speed (RPM)

1600 1700 1800 1600 1700 1800 1600 1700 1800 1 66.8 69.1 69.3 77.9 77.8 76.1 71.7 72.2 71.9 2 68.1 68.9 70.1 77.6 78.8 76.2 72.3 71.9 71.5 3 67.0 69.2 70.0 77.7 78.7 76.0 72.3 72.2 71.8 4 67.5 68.1 68.9 77.1 77.6 75.7 71.5 71.4 71.3 Write a SAS program to prepare a data file to perform two-way ANOVA with interaction.

This is an example of two-way classification of data with observations per cell ANOVA. The program to create SAS data set file for this problem is given below (Note: Codes used for the variables, fat level (f_level), speed (speed) and batch (batch) are 1-3, 1-3 and 1-4, respectively) :

(22)

DATA aksas.twoway_m;

INPUT f_level speed batch mc;

CARDS;

1 1 1 66.8 1 2 1 69.1 1 3 1 69.3 2 1 1 77.9 2 2 1 77.8 2 3 1 76.1 3 1 1 71.7 3 2 1 72.2 3 3 1 71.9 1 1 2 68.1 1 2 2 68.9 1 3 2 70.1 2 1 2 77.6 2 2 2 78.7 2 3 2 76.4 3 1 2 72.3 3 2 2 71.9 3 3 2 71.8 1 1 3 67.0 1 2 3 69.2 1 3 3 70.0 2 1 3 77.0 2 2 3 78.2 2 3 3 76.0 3 1 3 72.2 3 2 3 72.3 3 3 3 71.8 1 1 4 67.5 1 2 4 68.1 1 3 4 68.9 2 1 4 77.1 2 2 4 77.6 2 3 4 75.7 3 1 4 71.5 3 2 4 71.4 3 3 4 71.3

;

PROC GLM DATA = aksas.twoway_m;

CLASS f_level speed batch;

MODEL mc = f_level speed f_level*speed;

TITLE 'Analyse the effects of fat level and speed';

TITLE2 'Including Interaction';

RUN;

QUIT;

(23)

Program output (Example 5):

Analyse the effects of fat level and speed Including Interaction Obs f_level speed batch mc

1 1 1 1 66.8 2 1 2 1 69.1 3 1 3 1 69.3 4 2 1 1 77.9 5 2 2 1 77.8 6 2 3 1 76.1 7 3 1 1 71.7 8 3 2 1 72.2 9 3 3 1 71.9 10 1 1 2 68.1 11 1 2 2 68.9 12 1 3 2 70.1 13 2 1 2 77.6 14 2 2 2 78.7 15 2 3 2 76.4 16 3 1 2 72.3 17 3 2 2 71.9 18 3 3 2 71.8 19 1 1 3 67.0 20 1 2 3 69.2 21 1 3 3 70.0 22 2 1 3 77.0 23 2 2 3 78.2 24 2 3 3 76.0 25 3 1 3 72.2 26 3 2 3 72.3 27 3 3 3 71.8 28 1 1 4 67.5 29 1 2 4 68.1 30 1 3 4 68.9 31 2 1 4 77.1 32 2 2 4 77.6 33 2 3 4 75.7 34 3 1 4 71.5 35 3 2 4 71.4 36 3 3 4 71.3

(24)

Example programs (using data set, aksas.lactation.sas7bdat given in Example-1 above)

Ex.1. Let us learn how to work with SAS data set and to access and browse SAS data libraries.

The following program demonstrates how to generate a list of all the SAS files in the data library (aksas in this case) using _ALL_ keyword. Note that NODS option suppresses the descriptor portions of the data sets. NODS is used in conjunction with the keyword _ALL_.

LIBNAME aksas “d:\aksas”;

PROC CONTENTS DATA = aksas._ALL_ nods;

RUN;

Ex.2. The following program displays contents of the aksas.test data set file as well as prints listing of the data set according to specified condition and order.

OPTIONS NODATE;

LIBNAME aksas “d:\aksas”;

DATA aksas.test;

RUN;

PROC CONTENTS DATA = aksas.test;

RUN;

PROC PRINT DATA=aksas.test;

TITLE “State, Lactation and Breed wise Milk Yield Report”;

WHERE state=‘Haryana’;

VAR my lact;

RUN;

Output produced by SAS Program (Ex.2):

State, Lactation and Breed wise Milk Yield Report Obs MY LACT

1 18 1 2 19 2 3 21 3 4 20 4 5 23 1 6 25 2 7 22 3 8 21 4 9 22 1 10 24 2 11 23 3 12 23 4 13 27 1 14 26 2

(25)

15 25 3 16 25 4 Program explanation (Ex.2):

Two new statements, where and var have been introduced under PROC PRINT procedure, which need explanation. The other statements have been defined already. A where statement is used to select cases from a SAS data set that meet a particular condition. Thus, where statement subsets the input data by specifying certain condition(s) that each observation must meet before it is available for processing. The condition(s) that you define in a where statement is an arithmetic or logical expression that generally consists of a sequence of operands and operators. To compare character values, you must enclose them in single or double quotation marks and the values must match exactly, including capitalisation. Using the where statement might improve the efficiency of your SAS programs because SAS is not required to read all the observations in the input data set. Note: You can use only one where statement in a DATA step or a PROC step.

Now, let us further discuss the program described in this Example-2. The where statement is used here to define certain condition or compound condition (connected through AND, OR or NOT logical operators, etc. For more details on SAS operators visit:

http://support.sas.com/documentation/cdl/en/lrcon/62955/HTML/default/

viewer.htm#a000780367.htm) to be tested by SAS for selective data extraction from source data set (aksas.lactation.sas7bdat in this case) into the destination data set (aksas.test.sas7bdat in this exercise program). Consider the following statement:

This instructs to SAS that extract all those cases from the original data set for which the state variable contains data value as Haryana and stores them into destination data set file aksas.test.

The ‘=’ sign used in such conditions as above is for comparison and not for equality. Note that state variable is of character data type; therefore, while matching its data value, Haryana is enclosed in single quote marks. If the variable is numeric type, we equate the data value to the variable as such, e.g., my > 18 (i.e., extract all the cases for which milk yield is greater than 18).

We can replace earlier condition with the following:

WHERE state=‘Haryana’ and my between 20 and 30;

This statement would extract all the cases having state as Haryana as well as milk yield lying in the interval [20, 30] (see output with this compound condition instead of earlier simple condition, at the end of this paragraph).

Now, consider the statement:

VAR my lact;

This indicates to SAS that while printing the information from the destination data set file, consider only variables listed under var statement, i.e., my and lact in this case. The information will be printed according to the order in which the variable names are mentioned under var statement irrespective of the original order in which these variables have been actually stored in

(26)

computer memory. Hence, we can alter the order of variables while printing the report without disturbing their original settings.

Output of SAS Program (Ex.2) with compound condition in where statement:

State, Lactation and Breed wise Milk Yield Report Obs MY LACT

1 21 3 2 20 4 3 23 1 4 25 2 5 22 3 6 21 4 7 22 1 8 24 2 9 23 3 10 23 4 11 27 1 12 26 2 13 25 3 14 25 4

Programs for practice (All the following programs use data set, aksas.lactation.sas7bdat given in Example-1 above)

Now the readers can try the following programs similar to the program discussed in Example-2 above. You will come across some new statements and options that are explained; however, you can comprehend the new statements through inbuilt SAS help.

Ex.1. Program to subset cases from the master or base data set file, lactation.sas7bdat into sub data set or target file, red_sin.sas7bdat pertaining to ‘Red Sindhi’ breed of dairy cattle for both the states. Logical operator OR is used to form the compound condition in PROC PRINT step;

this means that it will extract the case if either of the two conditions is found TRUE. (Note: This is unlike AND operator used in Example-2; which was used to test both the conditions to be TRUE simultaneously). As stated earlier, spellings and case for the pattern matching string say,

‘Panjab’ should be unique throughout. If you store the state as Panjab and search for Punjab or panjab, etc., SAS will not be able to find it even though the one exists in data set file!

SAS Program (Ex.1):

OPTIONS NODATE;

Data aksas.red_sin;

Set aksas.lactation;

RUN;

(27)

PROC CONTENTS DATA = aksas.red_sin;

RUN;

Proc Print Data=aksas.red_sin;

Title "State, Lactation and Breed wise Milk Yield Report";

Where (state=‘Haryana’ and breed=‘Red Sindhi’) or (state=‘Panjab’ and breed=‘Red Sindhi’);

Var lact my;

Run;

Ex.2. This program subsets and prints the Haryana state’s cases on variables breed, lact and my.

OPTIONS NODATE;

DATA aksas.lact_hry;

RUN;

PROC PRINT DATA=aksas.lact_hry;

VAR breed lact my;

RUN;

Ex.3. This program subsets and prints the Panjab state’s cases on variables state, breed, lact and my.

OPTIONS NODATE;

DATA aksas.lact_pb;

WHERE state = ‘Panjab’;

RUN;

PROC PRINT DATA=aksas.lact_pb;

VAR state breed lact my;

RUN;

Ex.4. This subsets and prints the Panjab state’s cases on variables state, breed, lact and my according to the criterion ‘my between 30 and 35’. This example demonstrates a different way to perform the operation ‘my between 30 and 35’. SAS provides different tools to perform such operations. Referring to the inbuilt Help is a good idea to learn new tools in SAS.

OPTIONS NODATE;

DATA aksas.lact_pb;

WHERE state = ‘Panjab’ and (my>=30 and my<=35);

RUN;

(28)

RUN;

Ex.5. This is similar to the Ex.4; and subsets and prints the Panjab state’s cases on variables state, breed, lact and my according to the criterion ‘my between 30 and 35’ using still another option to perform the same operation ‘my between 30 and 35’.

OPTIONS NODATE;

DATA aksas.lact_pb;

WHERE state = ‘Panjab’ and (30<=my<=35);

RUN;

Ex.6. This program is also similar to the Ex.4; and i) subsets and prints the Panjab state’s cases on variables state, breed, lact and my according to the criterion ‘my between 30 and 35’; and ii) subsets and prints the Haryana state’s cases on variables state, breed, lact and my according to the criterion ‘my > 22’. This program demonstrates the use of ‘like’ operator with where statement. The like operator selects cases by comparing the values of a character variable to a specified pattern (called pattern matching). The like operator is case sensitive. There are two special characters available for specifying a pattern:

a) Percent sign (%) – specifies that any number of characters can occupy that position. The following where expressions select all cases having state with a data value that starts with the letters ‘Pan’ and ‘H’, respectively. The data value contained in variable, state can be of any length.

WHERE state like ‘Pan%’;

WHERE state like ‘H%’;

b) Underscore (_) – matches just one character in the value for each underscore character.

You can specify more than one consecutive underscore character in a pattern, and you can specify a percent sign and an underscore in the same pattern.

WHERE state LIKE 'P_n%'; (Note: data may now contain mixed names like Panjab or Punjab)

WHERE state LIKE ‘Har_a%’; (Note: data may now contain mixed names like Haryana or Hariana)

(29)

SAS Programs (Ex.6):

i)

OPTIONS NODATE;

LIBNAME aksas "d:\aksas";

DATA aksas.lact_pb;

WHERE state LIKE "Pan%" and (my between 30 and 35);

RUN;

Program Output (Ex.6-i):

State, Lactation and Breed wise Milk Yield Report Obs STATE BREED LACT MY

1 Panjab Hariana 3 35 2 Panjab Tharparkar 1 30 3 Panjab Tharparkar 2 32 4 Panjab Tharparkar 4 30 5 Panjab Sahiwal 3 30 ii)

OPTIONS NODATE;

LIBNAME aksas "d:\aksas";

DATA aksas.lact_hry;

WHERE state LIKE "H%" and (my>22);

RUN;

PROC PRINT DATA=aksas.lact_hry;

RUN;

Program output (Ex.6-ii):

1 Haryana Tharparkar 1 23 2 Haryana Tharparkar 2 25 3 Haryana Sahiwal 2 24 4 Haryana Sahiwal 3 23

(30)

5 Haryana Sahiwal 4 23 6 Haryana Red Sindhi 1 27 7 Haryana Red Sindhi 2 26 8 Haryana Red Sindhi 3 25 9 Haryana Red Sindhi 4 25

Ex.7. This program subsets and prints the cases for the state of Haryana. This example demonstrates a different way to perform the operation through subsetting IF statement rather than where statement. The IF statement causes the DATA step to continue processing only those raw data records or those observations from a SAS data set that meet the condition of the expression that is specified in the IF statement. That is, if the expression is TRUE for the case (its value is neither 0 nor missing), SAS continues to execute statements in the DATA step and includes the current case in the data set. The resulting SAS data set(s) contains a subset of the original external file or SAS data set. If the expression is FALSE (its value is 0 or missing), no further statements are processed for that case, the current case is not written to the data set, and the remaining program statements in the DATA step are not executed. SAS immediately returns to the beginning of the DATA step because the subsetting IF statement does not require additional statements to stop processing cases. The subsetting IF statement is equivalent to this IF-THEN statement, e.g.

IF (condition)

THEN SAS statement;

While creating SAS data sets, you can use the subsetting IF statement when it is easier to specify a condition for including cases. When it is easier to specify a condition for excluding cases, use the DELETE statement. See examples below for understanding the logic.

The subsetting IF and the where statements are not equivalent! The two statements work differently and produce different output data sets in some cases. The most important differences are summarised below:

 The subsetting IF statement selects observations that have been read into the program data vector. The where statement selects observations before they are brought into the program data vector. The subsetting IF might be less efficient than the where statement because it must read each observation from the input data set into the program data vector.

 The subsetting IF statement and where statement can produce different results in DATA steps that interleave, merge, or update SAS data sets.

 The subsetting IF statement can select cases from an existing SAS data set or from raw data that are read with the INPUT statement. The where statement can select cases only from existing SAS data sets.

 The subsetting IF statement is executable; the where statement is not.

(31)

SAS Program (Ex.7):

OPTIONS NODATE;

DATA aksas.lact_h;

IF (state = ‘Haryana’) THEN DELETE;

PROC PRINT DATA=aksas.lact_h;

RUN;

Program Output (Ex.7):

1 Panjab Hariana 1 25 2 Panjab Hariana 2 36 3 Panjab Hariana 3 35 4 Panjab Hariana 4 37 5 Panjab Tharparkar 1 30 6 Panjab Tharparkar 2 32 7 Panjab Tharparkar 3 29 8 Panjab Tharparkar 4 30 9 Panjab Sahiwal 1 26 10 Panjab Sahiwal 2 29 11 Panjab Sahiwal 3 30 12 Panjab Sahiwal 4 38 13 Panjab Red Sindhi 1 29 14 Panjab Red Sindhi 2 25 15 Panjab Red Sindhi 3 27 16 Panjab Red Sindhi 4 28 Ex.8. This program is similar to the previous example.

OPTIONS NODATE;

DATA aksas.lact_p;

IF NOT (state LIKE ‘H%’) THEN DELETE;

PROC PRINT DATA=aksas.lact_p;

RUN;

(32)

State, Lactation and Breed wise Milk Yield Report

Obs STATE BREED LACT MY 1 Panjab Hariana 1 25 2 Panjab Hariana 2 36 3 Panjab Hariana 3 35 4 Panjab Hariana 4 37 5 Panjab Tharparkar 1 30 6 Panjab Tharparkar 2 32 7 Panjab Tharparkar 3 29 8 Panjab Tharparkar 4 30 9 Panjab Sahiwal 1 26 10 Panjab Sahiwal 2 29 11 Panjab Sahiwal 3 30 12 Panjab Sahiwal 4 38 13 Panjab Red Sindhi 1 29 14 Panjab Red Sindhi 2 25 15 Panjab Red Sindhi 3 27 16 Panjab Red Sindhi 4 28 Ex.9. This program is also similar to previous examples.

data aksas.lact_ph;

set aksas.lactation;

if not (my>=25)then delete;

proc print data=aksas.lact_ph;

run;

State, Lactation and Breed wise Milk Yield Report Obs STATE BREED LACT MY 1 Haryana Tharparkar 2 25 2 Haryana Red Sindhi 1 27 3 Haryana Red Sindhi 2 26 4 Haryana Red Sindhi 3 25 5 Haryana Red Sindhi 4 25 6 Panjab Hariana 1 25 7 Panjab Hariana 2 36 8 Panjab Hariana 3 35 9 Panjab Hariana 4 37

(33)

10 Panjab Tharparkar 1 30 11 Panjab Tharparkar 2 32 12 Panjab Tharparkar 3 29 13 Panjab Tharparkar 4 30 14 Panjab Sahiwal 1 26 15 Panjab Sahiwal 2 29 16 Panjab Sahiwal 3 30 17 Panjab Sahiwal 4 38 18 Panjab Red Sindhi 1 29 19 Panjab Red Sindhi 2 25 20 Panjab Red Sindhi 3 27 21 Panjab Red Sindhi 4 28

KEEP and DROP statements – dropping and keeping variables

SAS will automatically write to any data set created by a DATA step all variables in the input data set(s) along with all new variables created in the DATA step. If you don’t want all these variables, you can indicate this to SAS in one of several ways. KEEP and DROP statements are used often to control the number of variables read into and output into the data sets. You can create several variables during the data processing, but need to save only selected variables in the final data set. If you want to restrict the number of columns in output (or target) data set, use the following method. This will ensure that output data set is created with required variables only.

DATA aksas.lacpb (keep = state my);

RUN;

Alternately, you can specify the first statement as follows:

DATA aksas.lacpb;

KEEP state my;

RUN;

If you are reading a big data set into SAS and wish to extract only a few variables from it, use the following statements in the program.

DATA aksas.lacpb;

SET base_data (keep = state breed my);

RUN;

In the first case, SAS reads the entire data set lactation, even though you only intend to use a few variables. In the second case, SAS reads from storage only the three variables you intend to keep.

(34)

Note that we have to use such efficient methods to restrict the data read into the system so as to optimise the system resources such as SAS Libraries and shared drives.

The same way DROP statement can also be specified based on the data requirement of the user.

Now, try following simple programs.

Ex.10: The following programs demonstrate KEEP/DROP statements to subset data from master (base) data set into target data set.

i)

DATA aksas.lactpb;

KEEP breed lact my;

RUN;

PROC PRINT DATA=aksas.lactpb;

Run;

ii)

DATA aksas.lactpb;

DROP state;

RUN;

PROC PRINT DATA=aksas.lactpb;

VAR breed lact my;

RUN;

(35)

Suggested Readings & WWW Resources

[1] An Introduction to SAS Data Steps. Social Science Computing Cooperative, University of Wisconsin-Madison. Accessed May 23, 2013. Available online at:

http://www.ssc.wisc.edu/sscc/pubs/4-18.htm.

[2] SAS Institute Inc., SAS/IML User’s Guide, Version 8, Cary, NC: SAS Institute Inc., 1999. pp. 846. Accessed May 23, 2013. Available online at:

http://ciser.cornell.edu/sasdoc/saspdf/iml/chap6.pdf.

[3] Gupta S, 2005. WHERE vs. IF Statements: Knowing the Difference in How and When to Apply. Paper: 238-31. http://www2.sas.com/proceedings/sugi31/238-31.pdf.

[4] SAS Operators.

http://support.sas.com/documentation/cdl/en/lrcon/62955/HTML/default/viewer.htm#a00 0 780367.htm.

[5] Statistical Computing with SAS. The Institute for Digital Research and Education (IDRE) Research Technology Group. The University of California, Los Angeles (UCLA).

Accessed: May 24, 2013. http://www.ats.ucla.edu/stats and https://idre.ucla.edu/stats/sas.

[6] Kurian, J. S for SAS: A Beginners’ Guide. Accessed May 24, 2013. Available online at:

http://www.puzha.com/sasbook/s4sas.pdf.

[7] Reference Manual - Data Analysis using SAS. Accessed June 18, 2013. Available online at: http://www.iasri.res.in/sscnars/contentmain.htm.