SQL Case Expression: An Alternative to DATA Step IF/THEN Statements
John Q. Zhang LCS Industries, Inc. Clifton NJ
ABSTRACT
This paper will show that case-expression
utilized in PROC SQL can be as efficient and flexible as
the logical expression IF/THEN statement. In addition to
the advantage of clear, brief code, PROC SQL can not
only perform procedural steps but data steps as well. This
paper will show how powerful a tool PROC SQL can be.
INTRODUCTION
Quite a few papers presented in the previous
conferences and current one have elaborated the
fundamentals of SAS
SQL procedure; the syntax like
CREATE, SELECT *, FROM, GROUP BY, ORDER
BY, WHERE, INTERSECT, UNION, EXCEPT,
EXISTS, INSERT and COALESCE function become
more and more familiar to the programmers and users. It
is not premature, definitely, to further exploit the
potential and capacity of PROC SQL with the objective
of benefiting the SAS data processing community. The
paper represents one attempt at this exploitation.
RATIONALE
The SQL CASE expression normally is
composed of case-operand, when-condition and
result-expression. They must be valid sql-expressions. The
syntax format goes like the following:
CASE<case-operand>
WHEN when condition THEN result-expression
<WHEN when-condition THEN result-expression>...
<ELSE result-expression>
END.
Multiple Nested CASE expression can also be used
correspondingly to nested IF/THEN statement:
CASE<case-operand>
WHEN when condition THEN(
CASE<case-operand>
WHEN when condition THEN(
result-expression)
<ELSE(result-expression)> END)
<ELSE(result-expression)> END.
Three versions are available of the SAS
second-order program (also called ‘program generator’). Two
versions are compiled by the base SAS DATA step, with
very little PROC SQL involved, while the third version
quite heavily leaned on PROC SQL (the complete
program presented in APPENDIX III).
The function of the programs is using SAS to
read COBOL copybooks (layouts)to create SAS programs
which are able to create SAS data sets ready for further
manipulation and analysis. Some of the copybooks are
longer than 500 lines. It would be considerably
time-consuming, tedious and error-prone to manually put in
the input statement, start position, variable names,
variable length and label statement, etc. Those programs
can take care of the problems and produce exactly the
same output in a few seconds.
To set up the input statement (a sample of a
copybook can be found in APPENDIX I)copybooks have
to be cut in 2 parts: the description part (called ‘dummy’
or ‘dummy1’ in the program generator); and the length
part (called ‘code’).
To create the variables using the description part
sequence numbers have to be concatenated with character
strings to avoid duplicates. After the variables are created
they can be concatenated with the description part itself
to set up label statement.
To set up the length of the variables data
transformation and manipulation of the length part of the
copybook are necessary, which is the major component of
the first 2 versions of the program generator, and where
IF/THEN statements or PROC SQL is heavily involved.
The basic operation is to take the number of the bytes
that the each variable occupies and express them as SAS
informats.
After the length of variables is set up, the
program can calculate the starting position of the
variables. First pick just the number (of bytes)in the
length, then sum the number for each variable using the
LAG function.
The first program uses IF/THEN statements in
SAS DATA step to assign SAS informats to the length of
variables created, i.e. to change COBOL notation to SAS
notation.
The CASE expression, specially the nested
CASE WHEN condition, supported in PROC SQL is
introduced in the second program to handle the same
task. The rest of the program is quite similar to the first
one. In APPENDIX IV part of the first program
(IF/THEN statements to set up length)is presented to
compare with the CASE expression utilized in the second
program.
The third program also uses DATA step
statements to manipulate the description part of the
copybook and the starting position of SAS variables as
well. But the PROC FORMAT procedure is used to
create SAS informats assigned to the variables
established.
The first program looks lengthier and more
redundant than the second as well as the third. The third
one works as well as the other two, and it is concise and
brief but a user-defined format, created by a SAS macro,
must be stored in a SAS catalog library.
CONCLUSION
After running the second program it can be seen
that the CASE expression of PROC SQL may completely
take the place of IF/THEN statements and result in the
same output(provided in APPENDIX II). It is more
efficient in terms of CPU time used. The code would be
more brief and concise combined with in-line views.
Therefore, another alternative to IF/THEN statements
would be available whenever a particular situation fits.
SAS users would be better armed in terms of choice of
techniques. It also shows the versatility of PROC SQL.
This paper does not intend to prove that PROC
SQL, especially the CASE expression, leaves nothing to
be desired in comparison with IF/THEN statements in
regular DATA step. In fact, it does have some
disadvantages.
For instance, if the situation is a multiple
condition with a single result CASE expressions would
work as well as IF/THEN statements, and the code even
briefer. This is what the 2nd program shows. However,
in the case of a single condition with multiple result the
CASE expression would appear lengthy and clumsy, even
though the output would be the same, for example,
IF A=1 THEN DO;
B=2;
C=3;
D=4;
... ;
END;
For PROC SQL, the code would be like the
following:
CASE WHEN(A=1)
THEN(2) END AS B,
CASE WHEN(A=1)
THEN(3) END AS C,
CASE WHEN(A=1)
THEN(4) END AS D....
The repetitive CASE WHEN condition makes
the code look awkward.
Secondly, some useful SAS functions and
operators are not supported in PROC SQL, definitely not
in CASE expression, such as the family of LAG
functions.
Note: this paper is just a experiment that
attempts to exploit and expand the capacity of PROC
SQL. Any comments, suggestions and advice would
certainly be welcome.
With respect to the application, the program
generators can be modified to read any flat file of layouts,
either in COBOL or in other languages, which could
readily facilitate SAS data step programming and
analysis.
The program generators appended to the paper
could set up the SAS programs that read only the
fixed-length-portion of COBOL files, in consideration of space.
Those for reading both fixed and variable-length-portion
COBOL files are also available but they become lengthier
and more complex.
ACKNOWLEDGMENTS
SAS is a registered trademark or trademark of SAS
Institute Inc. in the USA and other countries.
indicates
USA registration.
Thanks for the advice and information from Mr. Bernard
F. Kaine & Mr. Harold B. Legg.
CONTACT INFORMATION
John Q. Zhang
e-mail addr: jzhang@lcsi.com
(201)778-5588/EXT3486(O)
(212)567-6092(H)
APPENDIX I: SAMPLE OF COBOL COPYBOOK
*********************************************************** 96/08/30 * CPXXXXXX
* XXXXXXX STANDARDIZATION RECORD LAYOUT * LV009 ***********************************************************
* DPFF-FIELDS = STANDARDIZED RETURN FIELDS
* IN-FIELDS = ORIGINAL DATA PLUS COMPLETE NAME FIELD * PERS-FIELDS = PERSONS OUTPUT
*********************************************************** * PROJECT * 01 DPFF-SITE-RECORD. 03 DPFF-REC-LENGTH PIC X(05). 03 DPFF-ORIGINAL-LIST PIC X(03). 03 DPFF-SEQUENCE PIC X(08). 03 DPFF-STD-ADDRESS. 05 DPFF-TITLE PIC X(25). 05 DPFF-ADDRESS1 PIC X(10). 05 DPFF-ADDRESS2 PIC X(20). 05 DPFF-POST-CODE PIC X(08). 05 DPFF-COUNTRY PIC X(25). 07 PERS-GEOCODE PIC X(12). 07 PERS-GEOCODE-LEVEL PIC X(05). 07 PERS-STREET-KEY PIC X(05). 07 PERS-SEX-CODE PIC X(01).
03 DPFF-TOWN-POSITION PIC S9(09) COMP-3. 03 DPFF-TITLE-LENGTH PIC 9(02).
03 DPFF-TITLE-LENGTH2 PIC 9(04).
03 DPFF-TV-AREA-CODE PIC 9(04) COMP. 03 DPFF-PREMIS-POSITION PIC 9(08) COMP. 03 FILLER PIC X(45).
03 DPFF-CREATION-DATE PIC 9(06) COMP-3. 03 DPFF-THOROUGHFARE-POS PIC 9(03)V9 COMP-3. 03 DPFF-MAJOR-THORFR-POS PIC 9(03)V99 COMP-3. 03 DPFF-DEC-POSITION PIC S9(07)V999 COMP-3. 03 DPFF-COMPANY-INDICATOR PIC X(04). 03 DPFF-PRIVATE-ADDR-IND PIC X(04). 03 DPFF-FLAT-KEY PIC X(02). 03 DPFF-POSTCODE-DPS PIC X(02). 03 DPFF-LIST-CODES PIC X(10). 03 IN-ORIGINAL-RECORD. 05 IN-CESSATION-REASON PIC X(02). 05 IN-EXCHANGE-CODE PIC X(06). 05 IN-INSTALLATION-STATUS PIC X(01). 05 IN-CUSTOMER-CLASS PIC X(01). 05 IN-CUSTOMER-TYPE PIC X(02). 05 IN-END-DATE PIC X(08). 05 IN-START-DATE PIC X(08). 05 IN-SITE-INFO. 10 IN-TITLE PIC X(28). 10 IN-INITIALS PIC X(28). 05 IN-POSTCODE PIC X(10). 03 FILLER PIC X(1808).
APPENDIX II: OUTPUT
DATA XXXX; INFILE YYYY; INPUT @0001 RECLE14 $CHAR05. @0006 ORIGI15 $CHAR03. @0009 SEQUE16 $CHAR08. @0017 TITLE18 $CHAR25. @0042 ADDRE19 $CHAR10. @0052 ADDRE20 $CHAR20. @0072 POSTC21 $CHAR08. @0080 COUNT22 $CHAR25. @0105 GEOCO23 $CHAR12. @0117 GEOCO24 $CHAR05. @0122 STREE25 $CHAR05. @0127 SEXCO26 $CHAR01. @0128 TOWNP27 PD5. @0133 TITLE28 2. @0135 TITLE29 4.@0139 TVARE30 PIB2. @0141 PREMI31 PIB4. /*@0145 FILL32 $CHAR45. */ @0190 CREAT33 PD3. @0193 THORO34 PD2.1 @0195 MAJOR35 PD3.2 @0198 DECPO36 PD5.3 @0203 COMPA37 $CHAR04. @0207 PRIVA38 $CHAR04. @0211 FLATK39 $CHAR02. @0213 POSTC40 $CHAR02. @0215 LISTC41 $CHAR10. @0225 CESSA43 $CHAR02. @0227 EXCHA44 $CHAR06. @0233 INSTA45 $CHAR01. @0234 CUSTO46 $CHAR01. @0235 CUSTO47 $CHAR02. @0237 ENDDA48 $CHAR08. @0245 START49 $CHAR08. @0253 TITLE51 $CHAR28. @0281 INITI52 $CHAR28. @0309 POSTC53 $CHAR10. /*@0319 FILL54 $CHAR1808.*/ ; LABEL RECLE14="REC LENGTH" ORIGI15="ORIGINAL LIST" SEQUE16="SEQUENCE" TITLE18="TITLE" ADDRE19="ADDRESS1" ADDRE20="ADDRESS2" POSTC21="POST CODE" COUNT22="COUNTRY" GEOCO23="GEOCODE" GEOCO24="GEOCODE LEVEL" STREE25="STREET KEY" SEXCO26="SEX CODE" TOWNP27="TOWN POSITION" TITLE28="TITLE LENGTH" TITLE29="TITLE LENGTH2" TVARE30="TV AREA CODE" PREMI31="PREMIS POSITION" CREAT33="CREATION DATE" THORO34="THOROUGHFARE POS" MAJOR35="MAJOR THORFR POS" DECPO36="DEC POSITION" COMPA37="COMPANY INDICATOR" PRIVA38="PRIVATE ADDR IND" FLATK39="FLAT KEY" POSTC40="POSTCODE DPS" LISTC41="LIST CODES" CESSA43="CESSATION REASON" EXCHA44="EXCHANGE CODE" INSTA45="INSTALLATION STATUS" CUSTO46="CUSTOMER CLASS" CUSTO47="CUSTOMER TYPE" ENDDA48="END DATE" START49="START DATE" TITLE51="TITLE" INITI52="INITIALS" POSTC53="POSTCODE" ; RUN; QUIT;
APPENDIX III: THE 2nd PROGRAM(PROC SQL version)
FILENAME X 'CPXXXXXX.SAS';OPTIONS LS=132 MPRINT;
DATA INPUT;
INFILE X MISSOVER LENGTH=L; INPUT DUMMY $VARYING68. L; DUMMY=TRANSLATE(DUMMY,' ','-'); PIC=INDEXW(DUMMY,'PIC'); SEQ=_N_;
IF PIC THEN DO;
DUMMY1=SUBSTR(DUMMY,1,PIC-1); CODE=SUBSTR(DUMMY,PIC,INDEX(DUMMY,'.')-PIC+1); END; ELSE DELETE; FILL=INDEXW(DUMMY,'FILLER'); L1=INDEX(CODE,'('); L2=INDEX(CODE,')'); X=INDEX(CODE,'X('); T9=INDEX(CODE,'9('); T=INDEX(CODE,'3.'); V=INDEX(CODE,'V'); V9=INDEX(CODE,'V9'); V99=INDEX(CODE,'V99'); V999=INDEX(CODE,'V999'); COMP=INDEX(CODE,'COMP');
IF PIC & FILL=0 & INDEXC(DUMMY,'0123456789') THEN WORD=SCAN(DUMMY,2); RUN;
PROC SQL;
CREATE VIEW PICK AS SELECT UNIQUE WORD FROM INPUT WHERE WORD NE ' '; QUIT; DATA _NULL_; SET PICK; S+1; CALL SYMPUT('SYS'||LEFT(PUT(S,2.)),TRIM(LEFT(WORD))); CALL SYMPUT('CON',PUT(S,2.)); RUN; %MACRO SRCH1; %DO J=1 %TO &CON;
Y&J=INDEX(DUMMY,"&&SYS&J"); %END;
%MEND;
/*SET UP FIELDS, LENGTH AND LABEL*/
DATA INPUT2; SET INPUT; %SRCH1
PROC SQL;
CREATE VIEW MANU AS
SELECT LENG,FIELD,LENGTH,TRIM(FIELD)||'='||'"'||TRIM(LEFT(DSCRPTN))||'"' AS LABEL FROM(
SELECT CASE WHEN(INDEX(LENGTH,'.'))
THEN(COMPRESS(SUBSTR(LENGTH,1,INDEX(LENGTH,'.')-1),'$CHARPIBD.')) ELSE(LENGTH) END AS LENG, FIELD, LENGTH FROM(
SELECT CASE %MACRO SRCH2; %DO K=1 %TO &CON;
WHEN(Y&K AND LENGTH(COMPRESS(DUMMY1))>=5)
THEN(SUBSTR(COMPRESS(SUBSTR(DUMMY1,Y&K+LENGTH("&&SYS&K"))),1,5)||LEFT(PUT(SEQ,3.))) WHEN(Y&K AND LENGTH(COMPRESS(DUMMY1))<5)
THEN(COMPRESS(DUMMY1)||LEFT(PUT(SEQ,3.))) %END;
%MEND; %SRCH2
WHEN(FILL) THEN(SUBSTR(DUMMY1,FILL,4)||LEFT(PUT(SEQ,3.))) ELSE(DUMMY1) END AS FIELD,
CASE %MACRO SRCH3; %DO L=1 %TO &CON; WHEN(Y&L)
THEN(SUBSTR(DUMMY1,Y&L+LENGTH("&&SYS&L"),PIC-Y&L-LENGTH("&&SYS&L"))) %END;
%MEND; %SRCH3
ELSE(DUMMY1) END AS DSCRPTN, CASE WHEN(X & T9=0)
THEN(COMPRESS('$CHAR'||SUBSTR(CODE,L1+1,L2-L1-1)||'.')) WHEN(X=0 & COMP=0) THEN(CASE WHEN(T9)
THEN(CASE WHEN(V9=0) THEN(
COMPRESS(PUT(INPUT(SUBSTR(CODE,L1+1,L2-L1-1),3.),3.)||'.')) WHEN(V9 & V99=0 & V999=0) THEN(
COMPRESS(PUT((INPUT(SUBSTR(CODE,L1+1,L2-L1-1),3.)+1),3.)||'.1')) WHEN(V9 & V99 & V999=0) THEN(
COMPRESS(PUT((INPUT(SUBSTR(CODE,L1+1,L2-L1-1),3.)+2),3.)||'.2')) WHEN(V9 & V99 & V999) THEN(
COMPRESS(PUT((INPUT(SUBSTR(CODE,L1+1,L2-L1-1),3.)+3),3.)||'.3')) ELSE(CODE) END)
ELSE(CODE) END) WHEN(X=0 & COMP>0) THEN(CASE WHEN(T=0 & T9) THEN(
COMPRESS('PIB'||PUT((INPUT(SUBSTR(CODE,L1+1,L2-L1-1),3.)/2),3.)||'.')) WHEN(T) THEN(
CASE WHEN(T9) THEN( CASE WHEN(V9=0) THEN(
COMPRESS('PD'||PUT(CEIL(INPUT(SUBSTR(CODE,L1+1,L2-L1-1),3.)/2),3.)||'.')) WHEN(V9 & V99=0 & V999=0) THEN(
COMPRESS('PD'||PUT(CEIL((INPUT(SUBSTR(
CODE,L1+1,L2-L1-1),3.)+1)/2),3.)||'.1')) WHEN(V9 & V99 & V999=0) THEN(
COMPRESS('PD'||PUT(CEIL((INPUT(SUBSTR(
CODE,L1+1,L2-L1-1),3.)+2)/2),3.)||'.2')) WHEN(V9 & V99 & V999) THEN(
COMPRESS('PD'||PUT(CEIL((INPUT(SUBSTR(
CODE,L1+1,L2-L1-1),3.)+3)/2),3.)||'.3')) ELSE(CODE) END)
ELSE(CODE) END) ELSE(CODE) END) ELSE(CODE) END AS LENGTH FROM INPUT2)); QUIT;
/* SET UP STARTING POSITIONS */
DATA START(KEEP=START FIELD LENGTH LABEL); IF _N_=1 THEN STRT=1;
SET MANU;
STRT+INPUT(LENG,4.); STAT=LAG(STRT); IF STAT=. THEN STAT=1;
START='@'||LEFT(PUT(STAT,Z4.)); RUN;
/* WRITE TO A OUTPUT FILE AND MAKE A SAS PROGRAM READY */
DATA _NULL_; IF _N_=1 THEN PUT @5 'DATA XXXX;' /@7 'INFILE YYYY;' /@7 'INPUT'; DO UNTIL(END1); SET START END=END1;
IF FIELD ^=:'FILL' THEN PUT @10 START @20 FIELD @31 LENGTH; ELSE PUT @8 '/*' @10 START @20 FIELD @31 LENGTH @41 '*/'; IF END1 THEN PUT @10 ';'
/@7 ' ' /@7 'LABEL'; END;
DO UNTIL(END2); SET START END=END2;
IF INDEX(LABEL,'FILLER')=0 THEN PUT @10 LABEL; IF END2 THEN PUT @7 ';' /@7 'RUN;';
END; RUN; QUIT;
APPENDIX IV: corresponding IF/THEN statements
...
...
IF PIC & X THEN DO;
LENGTH='$CHAR'||SUBSTR(CODE,L1+1,L2-L1-1)||'.'; END;
ELSE IF PIC & X=0 THEN DO; IF COMP=0 & T=0 THEN DO; IF T9 & V=0 THEN DO;
LENGTH=COMPRESS(PUT(INPUT(SUBSTR(CODE,L1+1,L2-L1-1),3.),3.)||'.'); END;
ELSE IF T9 & V THEN DO; IF V9 & V99=0 THEN DO;
LENGTH=COMPRESS(PUT((INPUT(SUBSTR(CODE,L1+1,L2-L1-1),3.)+1),3.)||'.1'); END;
ELSE IF V9 & V99 & V999=0 THEN DO;
LENGTH=COMPRESS(PUT((INPUT(SUBSTR(CODE,L1+1,L2-L1-1),3.)+2),3.)||'.2'); END;
ELSE IF V9 & V99 & V999 THEN DO;
LENGTH=COMPRESS(PUT((INPUT(SUBSTR(CODE,L1+1,L2-L1-1),3.)+3),3.)||'.3'); END;
END; END;
ELSE IF COMP & T=0 THEN DO; IF T9 & V=0 THEN DO;
LENGTH=COMPRESS('PIB'||PUT(INPUT(SUBSTR(CODE,L1+1,L2-L1-1),3.)/2,3.)||'.'); END;
END;
ELSE IF COMP & T THEN DO; IF T9 & V=0 THEN DO;
LENGTH=COMPRESS('PD'||PUT((CEIL(INPUT(SUBSTR(CODE,L1+1,L2-L1-1),3.)/2)),3.)||'.'); END;
ELSE IF T9 & V THEN DO; IF V9 & V99=0 THEN DO;
LENGTH=COMPRESS('PD'||PUT(CEIL((INPUT(SUBSTR(CODE,L1+1,L2-L1-1),3.)+1)/2),3.)||'.1'); END;
ELSE IF V9 & V99 & V999=0 THEN DO;
LENGTH=COMPRESS('PD'||PUT(CEIL((INPUT(SUBSTR(CODE,L1+1,L2-L1-1),3.)+2)/2),3.)||'.2'); END;
ELSE IF V9 & V99 & V999 THEN DO;
LENGTH=COMPRESS('PD'||PUT(CEIL((INPUT(SUBSTR(CODE,L1+1,L2-L1-1),3.)+3)/2),3.)||'.3'); END; END; END; END; ... ...
•