• No results found

Practical Powerful Version control in SAS projects A rapid, Walk-through Workshop

N/A
N/A
Protected

Academic year: 2021

Share "Practical Powerful Version control in SAS projects A rapid, Walk-through Workshop"

Copied!
8
0
0

Loading.... (view fulltext now)

Full text

(1)

Practical Powerful Version control in SAS projects

A rapid, Walk-through Workshop

Lorne Salter, Blue Shield of California, San Francisco, CA

Gordon Cumming, Wells Fargo Bank, N.A., San Francisco, CA

ABSTRACT

Dave Thomas of The Pragmatic Programmer calls version control the super undo button, and much more. Versioning eliminates the proliferation of files named version 2, 3 4 and can help keep development libraries clean and

transparent while providing the safety net of a vault (the repository) that safely keeps a record of all earlier states and versions in the development of the project or system. Comments on each step in development are also logged and stored. I have found versioning gives me the security to take the risks that creative, imaginative development sometimes requires, because when it breaks or doesn’t meet expectations, I can always go back to an earlier version that does.

INTRODUCTIONS

PRESENTERS:

Lorne Salter worked on the same project, an ATM data mart, for 3 years and with Subversion for a year at Blue Shield of California on one non-SAS project and on a SAS project reporting on the value of a medical claims review vendor's results.

Gordon Cumming a SAS programmer for over 30 years is SAS administrator and developer at Wells Fargo and has worked with CVS in SAS development for 6 years at Wells Fargo.

BACKGROUND HISTORY:

At Wells Fargo there were up to 4 SAS developers working on projects at one time, but most of the time 2. Even with the small number of collaborators, the ability to track who made which change (subversion has a tongue-in-cheek command called "blame") and to revert to an earlier clean version proved invaluable. It made the work a lot less "exciting" interpersonally and more interesting for trying new ideas and approaches. The safety net of the repository encourages frequent commits with useful comments in the log like "got rid of the data corruption issue by splitting the long transform into 3 separate parallel jobs" (FYI we were using LSF scheduler and versioned our flows in that tool too).

This presentation is about version control for the benefit of an analyst and/or programmers; not really change management for managers. Though related, a change management system may or may not have any apparent benefit to the developers. Dave Thomas of The Pragmatic Programmer calls version control the super undo button, and much more. Every time you commit, there is an opportunity to add a message about the thought, about where you are at, and what the change does. When you browse the repository you see the revision history and the messages. In fact, the flaw in the Diebold voting machines was uncovered through comments in the versioning log, which was accessible on line. The investigators saw comments like "this really does not lock it up." made by the developers as they saved changes. Thus, for anyone, from a new team member to an external auditor, systems with properly used version control are much more transparent. The detective work and anxiety in taking on code in an existing system or process are reduced and I for one sleep better with it in place. As a developer, I think it makes my life easier.

We outline the experience, benefits and "costs" using the popular, completely free open source versioning programs CVS and its more powerful successor Subversion (embraced by such companies as Google) with a SAS data mart in a major bank and a SAS data repository and reporting system in a California health insurance company. A how-to on automatic sensing of development, testing and production environments with toggling of directory trees and libnames in UNIX is also presented. Subversion for Windows and the Tortoise Subversion GUI client for Windows Explorer are demonstrated.

The workshop will demonstrate how to download and install the free software and use CVS and Subversion in typical daily SAS project work. Tim Williams’s excellent paper at SAS Global Forum 2007 is referenced and different aspects are highlighted, as well as an update on the rather slick and improved Subversion successor.

(2)

KEYWORDS:

SAS, development, test, production, versioning, version control, CVS, Subversion

DEVELOPMENT/UNIT TEST, QA/SYSTEM TEST, UAT, AND PRODUCTION

On the mainframe ZOS these strategies are built into the system changing dataset names to filenames within the processes procs and off you go. You just code references to the filenames and the operating system takes care of the rest. This was the design objectives of IBM when they designed OS.

For UNIX and MS Windows you have several options.

Separate machines each with duplicate environments one for each environment. Depending on the power of the machine and the licensing costs this could cost quite a lot. The code does not have to change as the environment (directory structures) are duplicated on each machine. This has the least amount of risk because of no code changes; everything is promoted to each environment without change. What works in one should theoretically work in the next environment.

A single machine with a single image having each environment on a different tree structure or mount/share point. Changing the code between each environment; this introduces risk for what was tested is really not promoted into production. Syntax errors, environmental errors, forgotten changes all add to the risk. All these errors I have seen and are very common.

Or a single machine with a single image with each environment on a different tree structure or mount/share point. But with separate configurations for each environment in a way that the code need not be changed between environments. Some of the techniques will be discussed in this part of the presentation.

Under unix alias names can be built to pass different –config parameters to define the environment; alias sas_dev = ‘sas –config /sas/config/config.dev‘

alias sas_prod = ‘sas –config /sas/config/config.prod‘

For the Microsoft Windows environment a shortcut can be built with the –config c:\sas\config\config.dev parameter coded in the command line. Editing the file types for the .sas extension you can add run under development, run under production, run under …. With the correct commands associated with the file type.

EXAMPLE OF CONFIG FILES;

/* * config.dev */ ... -autoexec /sas/config/autoexec.dev -sasuser ~/sasuser.v91 -news /sas/config/news.dev ... /* * config.qa */ ... -autoexec /sas/config/autoexec.qa -sasuser ~/sasuser.v91 -news /sas/config/news.qa ...

(3)

/* * config.test */ ... -autoexec /sas/config/autoexec.test -sasuser ~/sasuser.v91 -news /sas/config/news.test ... /* * config.prod */ ... -autoexec /sas/config/autoexec.prod -sasuser ~/sasuser.v91 -news /sas/config/news.prod ...

AND EXAMPLES OF AUTOEXEC FILES CORELATING TO THE CONFIGURATION FILES;

*---; * Development Autoexec file;

* Set the working environment - dev/qa/test/prod; %let _env=dev;

%let _root=/sas/&_env./data;

*---; * Change the default SAS location;

x "cd ~/codebase"; ...

filename code "~/codebase"; ...

%let month=%sysfunc(intck(month,%sysfunc(inputn("&sysdate9"d, date9.)),-1), yymmn6);

*---; * Load predefined macros ; %include "~/codebase/macros/*.sas";

%let syscc=0;

%put NOTE: default month is &month;

*---; * Quality Test Autoexec file;

* Set the working environment - dev/qa/test/prod; %let _env=qa;

%let _root=/sas/&_env./data;

*---; * Change the default SAS location;

x "cd /sas/&_env./codebase"; ...

filename code "/sas/&_env./codebase"; ...

%let month=%sysfunc(intck(month,%sysfunc(inputn("&sysdate9"d, date9.)),-1), yymmn6);

*---; * Load predefined macros ; %include "/sas/&_env./codebase/macros/*.sas"; %let syscc=0;

(4)

*---; * User Acceptance Test Autoexec file;

* Set the working environment - dev/ut/test/prod; %let _env=test;

%let _root=/sas/&_env./data;

*---; * Change the default SAS location;

x "cd /sas/&_env./codebase"; ...

filename code "/sas/&_env./codebase"; ...

%let month=%sysfunc(intck(month,%sysfunc(inputn("&sysdate9"d, date9.)),-1), yymmn6);

*---; * Load predefined macros ; %include "/sas/&_env./codebase/macros/*.sas"; %let syscc=0;

%put NOTE: default month is &month;

*---; * Production Autoexec file;

* Set the working environment - dev/ut/test/prod; %let _env=prod;

%let _root=/sas/&_env./data;

*---; * Change the default SAS location;

x "cd /sas/&_env./codebase"; ...

filename code "/sas/&_env./codebase"; ...

%let month=%sysfunc(intck(month,%sysfunc(inputn("&sysdate9"d, date9.)),-1), yymmn6);

*---; * Load predefined macros ; %include "/sas/&_env./codebase/macros/*.sas"; %let syscc=0;

%put NOTE: default month is &month;

THE DIRECTORY STRUCTURES LOOK LIKE THIS;

/sas

/config /dev

/data

(data libraries for development) /qa

/codebase

(source versioned modules) /macros

/project1 /data

(data libraries for qa) /log (log files for QA) /test /codebase /data /log /prod /codebase /data /log

(5)

... /repository /cvs /subversion ... /home /developer1 /codebase /macros /project1 /codebase1

Macro variables setup in each of the autoexec files help when there needs to be a directory reference point for finding or writing flat files.

%let input_dir=/flatfile; %let extract_dir=/extracts; -or- %let input_dir=d:\flatfile; %let extract_dir=d:\extracts;

Using a source code versioning system most “modules” are checked out in a single directory (head directory) with the modules being in separate directories in the head directory. This can make including code a lot easier having the following code in the autoexec.

Filename code “~/codebase”; /* for development */ -or-

Filename code “/test/codebase”; /* for test */ -or-

Filename code “/production/codebase”; /* for production */

So with these filenames defined code can be easily included without change to the code while being promoted. This relies on a feature introduced over 30 ago still used on the mainframe but is not used a lot in the UNIX and windows arena. The file reference “code” used here refers to the above filename statement as the base directory at which to build the complete filename with paths.

%include code(module_name/sub_dir/code_to_be_included.sas); -or-

%include code(module_name\sub_dir\code.sas);

This allows for separate development environments, unit test, system test, QA, and production environment to exist on the same box without multiple license costs.

Automating the SAS code and eliminating costly daily, weekly, monthly, quarterly, and yearly edits (as you can see I am basicly lazy.) Also getting older, I am forgetting things as well. So, I learned to develop macro conditionals that execute outside of SAS macros. Using %scan and %eval to set values. One of the uses was to set default parameters (unless over-written) as follows;

%let parameter=%scan(&sysparm default,1, );

This sets the value of parameter to the first word of sysparm (passed via the command line or if blank to a default value. If blanks are involved in the parameter then replace them with colons or some other non-used character.

%let parameter=%scan(&sysparm:default blank,1,:);

FOR CONDITIONAL STEP EXECUTION I USED EITHER

%let sql_state=%scan(exec:noexec,%eval(&sysrc=1)+1,:); exec &sqlstate;

-code- exec exec;

(6)

-or-

%let run_state=%scan(cancel: ,%eval(&monthly=yes)+1,:); data _null_;

-code- run &run_state;

No macros were defined to execute this code. No code was harmed in the writting of this paper.

SOURCE/VERSION CONTROL WITH CVS AND SUBVERSION.

Version control methodologies come in many flavors. Some have a separate module for every project, some group projects separately in sub directories of a module, some will categorize the codes function (ETL, push, pull, report, audit, ...) and place them in separate modules or separate sub directories in a module.

Each developer has their own copy of the modules that they are working on. Once the code has been unit tested by the developer the code is level setted with a tag. The QA/System testing then can update their libraries to the specific tag (level set) and proceed to do system, functionally, regression testing sending the results back to development where any fixes are applied and tested. The next team via the project lead who updates the UAT (User Acceptance Test) environment with the latest level set and when the users signoff the code is then scheduled to be updated in production. During this time other development work on the same code can be going on without disturbing the code moving through the pipe. Checking out in different directory heads allows a developer to be working on several levels of the same code, when finished with a level set, can merge the code into other updates in progress.

The code though should be stable, not modified on a frequent basis (daily, weekly, monthly.) This minimises the occurrences of errors that can cause production problems. There is usually a way to automate code that does not need any modification. I have been challenged quite a few times and have prevailed. Sometimes it is not obvious but that's when I start flipping through the reference manuals or do a few web searches, I have heard that some have asked the SAS-L for solutions and got them.

TYPICAL DAY WITH VERSION CONTROL- WINDOWS SUBVERSION TORTOISE GUI EXAMPLE;

PROJECT A:

Suppose I just arrived at my cubicle, and I am working on a SAS program related to a reporting system in my development environment, which is a" working copy" of the system in a directory under my a folder with my name under our team's development directory has been "checked out" from the Subversion repository. More about checking out working copies in a minute. My program is a collection of loosely related macros to read in various data feeds. Depending on which files have arrived, a calling program includes the appropriate macros. There are several input statement macros to add fixed column text data arriving weekly to a large SAS table used for reporting. Some new elements have been added to the data feed and when I left yesterday I was halfway through adding the elements to one of the input statements. I finish the input statement and run an informal test against some preliminary test data our business partner who provides the text data has sent. After little debugging I am happy with the changes and decide to commit my changes. Typical process: I right click on the report_macros.sas program in the regular Windows explorer, which has a little red sub icon inside it indicating changes have been made and it no longer is identical to the repository version. The regular windows file menu appears with 2 special subversion items, SVN commit and SVN. I select the SVN commit and a dialogue box appears I am primarily interested in the message box so I type in " added 3 new variables, billing_code, region and product_code - Lorne S. November 5 2008" then I hit the commit/OK button. A log scrolls by indicating the commit is successful (or not) and the version number. If I refresh the windows explorer, the red "changed sub icon changes to a green checkmark. (As of writing, this update does not always happen right away. the log is a better source of information). For those who prefer the command line, both unix (of course) and windows have a command line equivalent with copious feedback when you run the commit filename command.

(7)

PROJECT B:

I now take a break from this project to get started on a new project I have just taken on. I will be working with 4 people who have started a project to report on quality to an industry quality organization. The system was developed so far by a very strong SAS programmer built it so far. It is a reasonably well organized system but sometimes the collaborators have trouble locating things or even identifying which program in a folder which they has not looked at in a while is the latest. I have offered to create a subversion repository. They gave me a "clean' directory with the latest programs and leaving out things like logs, test data etc. which they do not want versioned. The first step is to create a repository which can also be done in Tortoise by right clicking an empty directory with a name like svn. Then, one recommended approach is to import a directory structure like the project name with subdirectories, trunk, branches and tags (we are not covering branches and tags but they are cool and well documented). Then import the clean directory under trunk. All this can be done with right clicking the explorer if Tortoise Subversion is being used, or with straightforward terminal line commands like > SVN import ...parm parm parm. Note that the clean copy I imported from is NOT versioned. The next step is for me and all the collaborating developers to check out their own working copies which are their own personal development environment for that project.

Working copy checkout: the final most important step is to "check out" a copy to a directory. e.g.

Lornes_working_copy_project_B. Again with Tortoise Subversion, I can right click on the working directory I created and do SVN/checkout. The dialogue box allows me to navigate to the repository I am interested in and check out the project BELOW the TRUNK level (trunk is the main project location.) The directories, subdirectories and files in the project are copied to the working directory, and I am ready to start working. Typically, you only check out a project once. You can frequently resync with the repository with a global or local "update". All this checking out committing and updating seems like odious overhead at first. Our manager was on our case all the time to commit things and not circumvent the versioning system discipline for the first few months. But the payoff that makes me a versioning fan is described in the next section.

PROJECT A:

Finally, I get an email that the changes I made to the scenario a project above are cancelled. No problem: I right click on the program inside my working copy and select SVN/revert to and select the second most recent version. Voila, it is back as if the changes had not been made. But they are not really gone. Everything is saved. Then I commit the reverted copy to the repository and browse it. Note that the repository still has the changed version but it has added a new version identical to the version before the change. So if the clients change their minds again, I am ready.

LICENSES AND DOCUMENTATION.

SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Other brand and product names are trademarks of their respective companies.

CVS and Subversion are licensed under the GPL, and include a legal license which says the software is free and you can do most anything with it except slap your own brand on it and sell it.

The documentation includes the license and an explanation of it, very useful for managers etc... So I send them a copy if needed.

The Subversion and Tortoise documentation is clear, readable, even entertaining and the online troubleshooting solved any error messages we encountered with relatively little difficulty.

Links: As we already noted there is a wealth of good up-to-date information on the web that a search will reveal: the SAS World wide forum paper and official web sites below are an authoritative place to start.

SourceForge.com is a very good source for add-ons for these products. They include analysis of changes, auditing, web interfacing, and other very useful tools.

(8)

REFERENCES

http://www2.sas.com/proceedings/forum2007/006-2007.pdf

An excellent 2007 SUGI paper about experiences with CVS and SAS. http://subversion.tigris.org/faq.html#log-in-source http://tortoisesvn.net/downloads http://www.abbeyworkshop.com/howto/misc/svn01/ http://ximbiot.com/cvs http://sourceforge.net/search/?type_of_search=soft&type_of_search=soft&words=cvs http://cvsgui.sourceforge.net/

RECOMMENDED READING

O’Riely books on CVS and Subversion.

CONTACT INFORMATION

Your comments and questions are valued and encouraged. Contact the author at: Lorne Salter

Email: [email protected]

or

[email protected] Phones: work 415-229-6761

home 510-433-0303 cell 925-5867201

Gordon Cumming

References

Related documents

Policies are underwritten by Allmerica Financial Alliance Insurance Company, Allmerica Financial Benefit Insurance Company, The Citizens Insurance Company of America, Citizens

At the end of the training, AGE Tech’s curricular model will enable participants to: (1) define chronic disease models and the various components involved with disease management

The initial review positions the inquiry between early cinema and current crowdsourced projects to suggest that a reconsideration of industrial film (having influenced digital

The most far-reaching change in cross-border telemarketing fraud since 2003 is the substantial and growing involvement of Nigerian-led criminal rings in telemarketing fraud and

• When I forget to arm my security system when I leave my home, or I’m just pressed for time, I can arm it using my mobile phone. • If I’m working late, I can access my home via

The inclusion of non-linear measures describing speech dynamics in the production of sustained vowels has enhanced our characterization of the acoustic space and

load testing a coupon may include a test frame and a pair of spaced apart load balancing assemblies connected to the test frame, each load balancing assembly of the pair of

Social Science in Humanitarian Action www.socialscienceinaction.org.. In early July 2019, the Ministry of Health issued a circular stipulating the rates of remuneration