• No results found

SQL Case Expression: An Alternative to DATA Step IF/THEN Statements

N/A
N/A
Protected

Academic year: 2021

Share "SQL Case Expression: An Alternative to DATA Step IF/THEN Statements"

Copied!
7
0
0

Loading.... (view fulltext now)

Full text

(1)

SQL Case Expression: An Alternative to DATA Step IF/THEN Statements

John Q. Zhang LCS Industries, Inc. Clifton NJ

ABSTRACT

This paper will show that case-expression

utilized in PROC SQL can be as efficient and flexible as

the logical expression IF/THEN statement. In addition to

the advantage of clear, brief code, PROC SQL can not

only perform procedural steps but data steps as well. This

paper will show how powerful a tool PROC SQL can be.

INTRODUCTION

Quite a few papers presented in the previous

conferences and current one have elaborated the

fundamentals of SAS

SQL procedure; the syntax like

CREATE, SELECT *, FROM, GROUP BY, ORDER

BY, WHERE, INTERSECT, UNION, EXCEPT,

EXISTS, INSERT and COALESCE function become

more and more familiar to the programmers and users. It

is not premature, definitely, to further exploit the

potential and capacity of PROC SQL with the objective

of benefiting the SAS data processing community. The

paper represents one attempt at this exploitation.

RATIONALE

The SQL CASE expression normally is

composed of case-operand, when-condition and

result-expression. They must be valid sql-expressions. The

syntax format goes like the following:

CASE<case-operand>

WHEN when condition THEN result-expression

<WHEN when-condition THEN result-expression>...

<ELSE result-expression>

END.

Multiple Nested CASE expression can also be used

correspondingly to nested IF/THEN statement:

CASE<case-operand>

WHEN when condition THEN(

CASE<case-operand>

WHEN when condition THEN(

result-expression)

<ELSE(result-expression)> END)

<ELSE(result-expression)> END.

Three versions are available of the SAS

second-order program (also called ‘program generator’). Two

versions are compiled by the base SAS DATA step, with

very little PROC SQL involved, while the third version

quite heavily leaned on PROC SQL (the complete

program presented in APPENDIX III).

The function of the programs is using SAS to

read COBOL copybooks (layouts)to create SAS programs

which are able to create SAS data sets ready for further

manipulation and analysis. Some of the copybooks are

longer than 500 lines. It would be considerably

time-consuming, tedious and error-prone to manually put in

the input statement, start position, variable names,

variable length and label statement, etc. Those programs

can take care of the problems and produce exactly the

same output in a few seconds.

To set up the input statement (a sample of a

copybook can be found in APPENDIX I)copybooks have

to be cut in 2 parts: the description part (called ‘dummy’

or ‘dummy1’ in the program generator); and the length

part (called ‘code’).

To create the variables using the description part

sequence numbers have to be concatenated with character

strings to avoid duplicates. After the variables are created

they can be concatenated with the description part itself

to set up label statement.

To set up the length of the variables data

transformation and manipulation of the length part of the

copybook are necessary, which is the major component of

the first 2 versions of the program generator, and where

IF/THEN statements or PROC SQL is heavily involved.

The basic operation is to take the number of the bytes

that the each variable occupies and express them as SAS

informats.

After the length of variables is set up, the

program can calculate the starting position of the

variables. First pick just the number (of bytes)in the

length, then sum the number for each variable using the

LAG function.

The first program uses IF/THEN statements in

SAS DATA step to assign SAS informats to the length of

variables created, i.e. to change COBOL notation to SAS

notation.

The CASE expression, specially the nested

CASE WHEN condition, supported in PROC SQL is

(2)

introduced in the second program to handle the same

task. The rest of the program is quite similar to the first

one. In APPENDIX IV part of the first program

(IF/THEN statements to set up length)is presented to

compare with the CASE expression utilized in the second

program.

The third program also uses DATA step

statements to manipulate the description part of the

copybook and the starting position of SAS variables as

well. But the PROC FORMAT procedure is used to

create SAS informats assigned to the variables

established.

The first program looks lengthier and more

redundant than the second as well as the third. The third

one works as well as the other two, and it is concise and

brief but a user-defined format, created by a SAS macro,

must be stored in a SAS catalog library.

CONCLUSION

After running the second program it can be seen

that the CASE expression of PROC SQL may completely

take the place of IF/THEN statements and result in the

same output(provided in APPENDIX II). It is more

efficient in terms of CPU time used. The code would be

more brief and concise combined with in-line views.

Therefore, another alternative to IF/THEN statements

would be available whenever a particular situation fits.

SAS users would be better armed in terms of choice of

techniques. It also shows the versatility of PROC SQL.

This paper does not intend to prove that PROC

SQL, especially the CASE expression, leaves nothing to

be desired in comparison with IF/THEN statements in

regular DATA step. In fact, it does have some

disadvantages.

For instance, if the situation is a multiple

condition with a single result CASE expressions would

work as well as IF/THEN statements, and the code even

briefer. This is what the 2nd program shows. However,

in the case of a single condition with multiple result the

CASE expression would appear lengthy and clumsy, even

though the output would be the same, for example,

IF A=1 THEN DO;

B=2;

C=3;

D=4;

... ;

END;

For PROC SQL, the code would be like the

following:

CASE WHEN(A=1)

THEN(2) END AS B,

CASE WHEN(A=1)

THEN(3) END AS C,

CASE WHEN(A=1)

THEN(4) END AS D....

The repetitive CASE WHEN condition makes

the code look awkward.

Secondly, some useful SAS functions and

operators are not supported in PROC SQL, definitely not

in CASE expression, such as the family of LAG

functions.

Note: this paper is just a experiment that

attempts to exploit and expand the capacity of PROC

SQL. Any comments, suggestions and advice would

certainly be welcome.

With respect to the application, the program

generators can be modified to read any flat file of layouts,

either in COBOL or in other languages, which could

readily facilitate SAS data step programming and

analysis.

The program generators appended to the paper

could set up the SAS programs that read only the

fixed-length-portion of COBOL files, in consideration of space.

Those for reading both fixed and variable-length-portion

COBOL files are also available but they become lengthier

and more complex.

ACKNOWLEDGMENTS

SAS is a registered trademark or trademark of SAS

Institute Inc. in the USA and other countries.

indicates

USA registration.

Thanks for the advice and information from Mr. Bernard

F. Kaine & Mr. Harold B. Legg.

CONTACT INFORMATION

John Q. Zhang

e-mail addr: jzhang@lcsi.com

(201)778-5588/EXT3486(O)

(212)567-6092(H)

(3)

APPENDIX I: SAMPLE OF COBOL COPYBOOK

*********************************************************** 96/08/30 * CPXXXXXX

* XXXXXXX STANDARDIZATION RECORD LAYOUT * LV009 ***********************************************************

* DPFF-FIELDS = STANDARDIZED RETURN FIELDS

* IN-FIELDS = ORIGINAL DATA PLUS COMPLETE NAME FIELD * PERS-FIELDS = PERSONS OUTPUT

*********************************************************** * PROJECT * 01 DPFF-SITE-RECORD. 03 DPFF-REC-LENGTH PIC X(05). 03 DPFF-ORIGINAL-LIST PIC X(03). 03 DPFF-SEQUENCE PIC X(08). 03 DPFF-STD-ADDRESS. 05 DPFF-TITLE PIC X(25). 05 DPFF-ADDRESS1 PIC X(10). 05 DPFF-ADDRESS2 PIC X(20). 05 DPFF-POST-CODE PIC X(08). 05 DPFF-COUNTRY PIC X(25). 07 PERS-GEOCODE PIC X(12). 07 PERS-GEOCODE-LEVEL PIC X(05). 07 PERS-STREET-KEY PIC X(05). 07 PERS-SEX-CODE PIC X(01).

03 DPFF-TOWN-POSITION PIC S9(09) COMP-3. 03 DPFF-TITLE-LENGTH PIC 9(02).

03 DPFF-TITLE-LENGTH2 PIC 9(04).

03 DPFF-TV-AREA-CODE PIC 9(04) COMP. 03 DPFF-PREMIS-POSITION PIC 9(08) COMP. 03 FILLER PIC X(45).

03 DPFF-CREATION-DATE PIC 9(06) COMP-3. 03 DPFF-THOROUGHFARE-POS PIC 9(03)V9 COMP-3. 03 DPFF-MAJOR-THORFR-POS PIC 9(03)V99 COMP-3. 03 DPFF-DEC-POSITION PIC S9(07)V999 COMP-3. 03 DPFF-COMPANY-INDICATOR PIC X(04). 03 DPFF-PRIVATE-ADDR-IND PIC X(04). 03 DPFF-FLAT-KEY PIC X(02). 03 DPFF-POSTCODE-DPS PIC X(02). 03 DPFF-LIST-CODES PIC X(10). 03 IN-ORIGINAL-RECORD. 05 IN-CESSATION-REASON PIC X(02). 05 IN-EXCHANGE-CODE PIC X(06). 05 IN-INSTALLATION-STATUS PIC X(01). 05 IN-CUSTOMER-CLASS PIC X(01). 05 IN-CUSTOMER-TYPE PIC X(02). 05 IN-END-DATE PIC X(08). 05 IN-START-DATE PIC X(08). 05 IN-SITE-INFO. 10 IN-TITLE PIC X(28). 10 IN-INITIALS PIC X(28). 05 IN-POSTCODE PIC X(10). 03 FILLER PIC X(1808).

APPENDIX II: OUTPUT

DATA XXXX; INFILE YYYY; INPUT @0001 RECLE14 $CHAR05. @0006 ORIGI15 $CHAR03. @0009 SEQUE16 $CHAR08. @0017 TITLE18 $CHAR25. @0042 ADDRE19 $CHAR10. @0052 ADDRE20 $CHAR20. @0072 POSTC21 $CHAR08. @0080 COUNT22 $CHAR25. @0105 GEOCO23 $CHAR12. @0117 GEOCO24 $CHAR05. @0122 STREE25 $CHAR05. @0127 SEXCO26 $CHAR01. @0128 TOWNP27 PD5. @0133 TITLE28 2. @0135 TITLE29 4.
(4)

@0139 TVARE30 PIB2. @0141 PREMI31 PIB4. /*@0145 FILL32 $CHAR45. */ @0190 CREAT33 PD3. @0193 THORO34 PD2.1 @0195 MAJOR35 PD3.2 @0198 DECPO36 PD5.3 @0203 COMPA37 $CHAR04. @0207 PRIVA38 $CHAR04. @0211 FLATK39 $CHAR02. @0213 POSTC40 $CHAR02. @0215 LISTC41 $CHAR10. @0225 CESSA43 $CHAR02. @0227 EXCHA44 $CHAR06. @0233 INSTA45 $CHAR01. @0234 CUSTO46 $CHAR01. @0235 CUSTO47 $CHAR02. @0237 ENDDA48 $CHAR08. @0245 START49 $CHAR08. @0253 TITLE51 $CHAR28. @0281 INITI52 $CHAR28. @0309 POSTC53 $CHAR10. /*@0319 FILL54 $CHAR1808.*/ ; LABEL RECLE14="REC LENGTH" ORIGI15="ORIGINAL LIST" SEQUE16="SEQUENCE" TITLE18="TITLE" ADDRE19="ADDRESS1" ADDRE20="ADDRESS2" POSTC21="POST CODE" COUNT22="COUNTRY" GEOCO23="GEOCODE" GEOCO24="GEOCODE LEVEL" STREE25="STREET KEY" SEXCO26="SEX CODE" TOWNP27="TOWN POSITION" TITLE28="TITLE LENGTH" TITLE29="TITLE LENGTH2" TVARE30="TV AREA CODE" PREMI31="PREMIS POSITION" CREAT33="CREATION DATE" THORO34="THOROUGHFARE POS" MAJOR35="MAJOR THORFR POS" DECPO36="DEC POSITION" COMPA37="COMPANY INDICATOR" PRIVA38="PRIVATE ADDR IND" FLATK39="FLAT KEY" POSTC40="POSTCODE DPS" LISTC41="LIST CODES" CESSA43="CESSATION REASON" EXCHA44="EXCHANGE CODE" INSTA45="INSTALLATION STATUS" CUSTO46="CUSTOMER CLASS" CUSTO47="CUSTOMER TYPE" ENDDA48="END DATE" START49="START DATE" TITLE51="TITLE" INITI52="INITIALS" POSTC53="POSTCODE" ; RUN; QUIT;

(5)

APPENDIX III: THE 2nd PROGRAM(PROC SQL version)

FILENAME X 'CPXXXXXX.SAS';

OPTIONS LS=132 MPRINT;

DATA INPUT;

INFILE X MISSOVER LENGTH=L; INPUT DUMMY $VARYING68. L; DUMMY=TRANSLATE(DUMMY,' ','-'); PIC=INDEXW(DUMMY,'PIC'); SEQ=_N_;

IF PIC THEN DO;

DUMMY1=SUBSTR(DUMMY,1,PIC-1); CODE=SUBSTR(DUMMY,PIC,INDEX(DUMMY,'.')-PIC+1); END; ELSE DELETE; FILL=INDEXW(DUMMY,'FILLER'); L1=INDEX(CODE,'('); L2=INDEX(CODE,')'); X=INDEX(CODE,'X('); T9=INDEX(CODE,'9('); T=INDEX(CODE,'3.'); V=INDEX(CODE,'V'); V9=INDEX(CODE,'V9'); V99=INDEX(CODE,'V99'); V999=INDEX(CODE,'V999'); COMP=INDEX(CODE,'COMP');

IF PIC & FILL=0 & INDEXC(DUMMY,'0123456789') THEN WORD=SCAN(DUMMY,2); RUN;

PROC SQL;

CREATE VIEW PICK AS SELECT UNIQUE WORD FROM INPUT WHERE WORD NE ' '; QUIT; DATA _NULL_; SET PICK; S+1; CALL SYMPUT('SYS'||LEFT(PUT(S,2.)),TRIM(LEFT(WORD))); CALL SYMPUT('CON',PUT(S,2.)); RUN; %MACRO SRCH1; %DO J=1 %TO &CON;

Y&J=INDEX(DUMMY,"&&SYS&J"); %END;

%MEND;

/*SET UP FIELDS, LENGTH AND LABEL*/

DATA INPUT2; SET INPUT; %SRCH1

PROC SQL;

CREATE VIEW MANU AS

SELECT LENG,FIELD,LENGTH,TRIM(FIELD)||'='||'"'||TRIM(LEFT(DSCRPTN))||'"' AS LABEL FROM(

SELECT CASE WHEN(INDEX(LENGTH,'.'))

THEN(COMPRESS(SUBSTR(LENGTH,1,INDEX(LENGTH,'.')-1),'$CHARPIBD.')) ELSE(LENGTH) END AS LENG, FIELD, LENGTH FROM(

SELECT CASE %MACRO SRCH2; %DO K=1 %TO &CON;

WHEN(Y&K AND LENGTH(COMPRESS(DUMMY1))>=5)

THEN(SUBSTR(COMPRESS(SUBSTR(DUMMY1,Y&K+LENGTH("&&SYS&K"))),1,5)||LEFT(PUT(SEQ,3.))) WHEN(Y&K AND LENGTH(COMPRESS(DUMMY1))<5)

THEN(COMPRESS(DUMMY1)||LEFT(PUT(SEQ,3.))) %END;

%MEND; %SRCH2

WHEN(FILL) THEN(SUBSTR(DUMMY1,FILL,4)||LEFT(PUT(SEQ,3.))) ELSE(DUMMY1) END AS FIELD,

CASE %MACRO SRCH3; %DO L=1 %TO &CON; WHEN(Y&L)

(6)

THEN(SUBSTR(DUMMY1,Y&L+LENGTH("&&SYS&L"),PIC-Y&L-LENGTH("&&SYS&L"))) %END;

%MEND; %SRCH3

ELSE(DUMMY1) END AS DSCRPTN, CASE WHEN(X & T9=0)

THEN(COMPRESS('$CHAR'||SUBSTR(CODE,L1+1,L2-L1-1)||'.')) WHEN(X=0 & COMP=0) THEN(CASE WHEN(T9)

THEN(CASE WHEN(V9=0) THEN(

COMPRESS(PUT(INPUT(SUBSTR(CODE,L1+1,L2-L1-1),3.),3.)||'.')) WHEN(V9 & V99=0 & V999=0) THEN(

COMPRESS(PUT((INPUT(SUBSTR(CODE,L1+1,L2-L1-1),3.)+1),3.)||'.1')) WHEN(V9 & V99 & V999=0) THEN(

COMPRESS(PUT((INPUT(SUBSTR(CODE,L1+1,L2-L1-1),3.)+2),3.)||'.2')) WHEN(V9 & V99 & V999) THEN(

COMPRESS(PUT((INPUT(SUBSTR(CODE,L1+1,L2-L1-1),3.)+3),3.)||'.3')) ELSE(CODE) END)

ELSE(CODE) END) WHEN(X=0 & COMP>0) THEN(CASE WHEN(T=0 & T9) THEN(

COMPRESS('PIB'||PUT((INPUT(SUBSTR(CODE,L1+1,L2-L1-1),3.)/2),3.)||'.')) WHEN(T) THEN(

CASE WHEN(T9) THEN( CASE WHEN(V9=0) THEN(

COMPRESS('PD'||PUT(CEIL(INPUT(SUBSTR(CODE,L1+1,L2-L1-1),3.)/2),3.)||'.')) WHEN(V9 & V99=0 & V999=0) THEN(

COMPRESS('PD'||PUT(CEIL((INPUT(SUBSTR(

CODE,L1+1,L2-L1-1),3.)+1)/2),3.)||'.1')) WHEN(V9 & V99 & V999=0) THEN(

COMPRESS('PD'||PUT(CEIL((INPUT(SUBSTR(

CODE,L1+1,L2-L1-1),3.)+2)/2),3.)||'.2')) WHEN(V9 & V99 & V999) THEN(

COMPRESS('PD'||PUT(CEIL((INPUT(SUBSTR(

CODE,L1+1,L2-L1-1),3.)+3)/2),3.)||'.3')) ELSE(CODE) END)

ELSE(CODE) END) ELSE(CODE) END) ELSE(CODE) END AS LENGTH FROM INPUT2)); QUIT;

/* SET UP STARTING POSITIONS */

DATA START(KEEP=START FIELD LENGTH LABEL); IF _N_=1 THEN STRT=1;

SET MANU;

STRT+INPUT(LENG,4.); STAT=LAG(STRT); IF STAT=. THEN STAT=1;

START='@'||LEFT(PUT(STAT,Z4.)); RUN;

/* WRITE TO A OUTPUT FILE AND MAKE A SAS PROGRAM READY */

DATA _NULL_; IF _N_=1 THEN PUT @5 'DATA XXXX;' /@7 'INFILE YYYY;' /@7 'INPUT'; DO UNTIL(END1); SET START END=END1;

IF FIELD ^=:'FILL' THEN PUT @10 START @20 FIELD @31 LENGTH; ELSE PUT @8 '/*' @10 START @20 FIELD @31 LENGTH @41 '*/'; IF END1 THEN PUT @10 ';'

/@7 ' ' /@7 'LABEL'; END;

DO UNTIL(END2); SET START END=END2;

IF INDEX(LABEL,'FILLER')=0 THEN PUT @10 LABEL; IF END2 THEN PUT @7 ';' /@7 'RUN;';

END; RUN; QUIT;

(7)

APPENDIX IV: corresponding IF/THEN statements

...

...

IF PIC & X THEN DO;

LENGTH='$CHAR'||SUBSTR(CODE,L1+1,L2-L1-1)||'.'; END;

ELSE IF PIC & X=0 THEN DO; IF COMP=0 & T=0 THEN DO; IF T9 & V=0 THEN DO;

LENGTH=COMPRESS(PUT(INPUT(SUBSTR(CODE,L1+1,L2-L1-1),3.),3.)||'.'); END;

ELSE IF T9 & V THEN DO; IF V9 & V99=0 THEN DO;

LENGTH=COMPRESS(PUT((INPUT(SUBSTR(CODE,L1+1,L2-L1-1),3.)+1),3.)||'.1'); END;

ELSE IF V9 & V99 & V999=0 THEN DO;

LENGTH=COMPRESS(PUT((INPUT(SUBSTR(CODE,L1+1,L2-L1-1),3.)+2),3.)||'.2'); END;

ELSE IF V9 & V99 & V999 THEN DO;

LENGTH=COMPRESS(PUT((INPUT(SUBSTR(CODE,L1+1,L2-L1-1),3.)+3),3.)||'.3'); END;

END; END;

ELSE IF COMP & T=0 THEN DO; IF T9 & V=0 THEN DO;

LENGTH=COMPRESS('PIB'||PUT(INPUT(SUBSTR(CODE,L1+1,L2-L1-1),3.)/2,3.)||'.'); END;

END;

ELSE IF COMP & T THEN DO; IF T9 & V=0 THEN DO;

LENGTH=COMPRESS('PD'||PUT((CEIL(INPUT(SUBSTR(CODE,L1+1,L2-L1-1),3.)/2)),3.)||'.'); END;

ELSE IF T9 & V THEN DO; IF V9 & V99=0 THEN DO;

LENGTH=COMPRESS('PD'||PUT(CEIL((INPUT(SUBSTR(CODE,L1+1,L2-L1-1),3.)+1)/2),3.)||'.1'); END;

ELSE IF V9 & V99 & V999=0 THEN DO;

LENGTH=COMPRESS('PD'||PUT(CEIL((INPUT(SUBSTR(CODE,L1+1,L2-L1-1),3.)+2)/2),3.)||'.2'); END;

ELSE IF V9 & V99 & V999 THEN DO;

LENGTH=COMPRESS('PD'||PUT(CEIL((INPUT(SUBSTR(CODE,L1+1,L2-L1-1),3.)+3)/2),3.)||'.3'); END; END; END; END; ... ...

References

Related documents

Lafler, Kirk Paul (2011), “Exploring DATA Step Merges and PROC SQL Joins,” Proceedings of the 2011 PharmaSUG Conference, Software Intelligence Corporation, Spring Valley, CA,

Lafler, Kirk Paul (2010), “Exploring DATA Step Merges and PROC SQL Joins,” South Central SAS Users Group (SCSUG) 2010 Conference, Software Intelligence Corporation, Spring Valley,

Attendees explore examples that contrast DATA step versus PROC SQL programming techniques to conduct conditional logic scenarios, one-to-one match-merges and

When it comes to performing PROC SQL joins, users supply the names of the tables for joining along with the join conditions, and the PROC SQL optimizer determines which of

Explicit PASSTHRU will execute T-SQL code syntax inside SQL Server, while implicit PASSTHRU will use your SAS DATA step or PROC SQL code.. Several useful options will

Since every installation of SAS comes with sample data in the SASHELP library, the data file SASHELP.SHOES will be used to demonstrate the PROC SQL code used throughout this paper.

create table tablename as select [distinct]. column1, column2, [*], … from library.table where expression order

The survey applications use SAS through PROC SQL to read data from an existing SQL Server database or SAS macro language to update or insert data into SQL Server tables..