Source Code Revision Control Systems and Auto-Documenting Headers for SAS®
Programs on a UNIX or PC Multiuser Environment Terek Peterson, Alliance Consulting Group, Philadelphia, PA
Max Cherny, Alliance Consulting Group, Philadelphia, PA
ABSTRACT
This paper discusses two free products available on UNIX and PC environments called SCCS (Source Code Control System) and RCS (Revision Control System). When used in a multi-user environment, these systems give programmers a tool to enforce change control, create versions of programs, document changes to programs, create backups during reporting efforts, and automatically update vital information directly within a SAS program. These systems help to create an audit trail for programs as required by the FDA for a drug trial.
INTRODUCTION
The pharmaceutical industry is one of the most regulated industries in this country. Almost every step of a clinical trial is subject to very strict FDA regulations. SAS programs used to summarize and analyze the results of a clinical trial are no exception. SAS programs are considered by the FDA to be an important element of the reliability and accuracy of the information used to approve and manufacture drugs. These programs must adhere to FDA regulations, which require extensive design and documentation controls. The FDA requires SAScode to be compliant with a System Development Life Cycle (SDLC). This includes, but is not limited to, change control, backup/recovery and installation procedures. Drug companies must be able to provide documentation of their software processes and procedures on demand in order to show their software has been safely created and that any changes can be properly managed.
Regardless of doing a clinical trial, adherence to a SDLC is a good systems development practice. Besides helping drug companies to meet the FDA regulations, the proper implementation of a SDLC can reduce development costs and decrease the amount of time it takes to report and analyze a clinical trial.
Source code control refers to controlling the process of modifying software by managing changes. It lets developers control who can make changes and when these changes are made. It helps to prevent conflicts that could arise when many people edit the same file. It lets programmers save multiple versions of a file and choose the one they would like to use. It also lets them review the history of changes made to a file.
There are many different utilities and applications on the market designed to help developers implement a source code control process. This paper discusses two of them, Revision Control System (RCS) and Source Code Control System (SCCS). These are free utilities which are readily available on most UNIX systems and can be also installed on Windows 9x/NT. These systems have been used for many years and there is a lot of free documentation regarding their usage.
RSC VS. SCCS
The SCCS system was developed by AT&T in the 1970s as a simple system to control source code development. It includes various features that help support a production development environment
RCS, which is a more powerful utility than SCCS, was written in early 1980s by Walter F. Tichy at Purdue University in Indiana. It used many programs developed in the 1970s, learning from mistakes of SCCS and can be seen as a logical improvement of SCCS. Since RCS stores the latest version in full, it is much faster in retrieving the latest version. RCS is also faster than SCCS in retrieving older versions and RCS has an easier interface for first time users. RCS commands are more intuitive than SCCS commands. In addition, there are less commands, the system is more consistent and it has a greater variety of options. Since RCS is a newer, more powerful source code control system, this paper will primarily focus on that system.
KEY FEATURES OF SOURCE CONTROL UTILITIES
RCS does not store the entire version of each file but only the differences between files. Therefore, all versions of a program are being saved in a space efficient way. This can significantly reduce the amount of space required to keep all source files.
Since previous revisions remain easily accessible, all subsequent changes do not overwrite the original. Users are always able to retrieve any given revision using version numbers, symbolic names, dates, authors and states. Users can retreat to an older version if they decide that the current version is incorrect, or if they want to see some text that has since been changed or deleted. The only way to permanently delete or corrupt a program is to modify a file located in the RCS directory. The ability to recreate old versions of a single file or a group of files allows users to use source code control as a simple backup system and to recover and modify earlier releases of a product or document. By keeping all file revisions in a separate location, source code control utilities protect these programs from many cases of accidental file deletion or corruption. If a user checks out a program and then accidentally removes it they can easily restore the original version of a program since a copy is kept in another location.
RCS allows users to perform a limited, personalized backup of files. If files are being backed up once a week, programmers can check their daily progress notes into a source code control system and have a safe copy in case any of these files are accidentally damaged or deleted. This should not be viewed as a replacement for doing system backups because users will still be vulnerable to system or disk problems, but it may provide for a better control over recovering from a disaster
RCS may be used to formalize the process of installing the final version of a program from a user's environment to a production environment. A programmer may check in a final version of a program from their user area and then send a request to a dedicated person to install that version into a production area.
The installation of a program into a production area will be done simply by checking out files from the RCS library.
All changes to a file are automatically stored in a log file within the RCS subdirectory. Text of each revision, author identification, date and time of check-in and log message summarizing the change is kept by RCS. This allows someone the ability to review the entire history of a program without having to track down colleagues or look for old notes and comments. In another words, it creates an audit trail. RCS provides a version numbering
scheme so users can tell which versions of a file are more recent.
By saving the history of revisions to a file, users are able to analyze that history later. This can be invaluable because it gives them the ability to see the logic of each incremental change that led from the original source to the current source. If several people are working with the same version, a source code control system can help them coordinate their work by alerting the programmers, preventing one modification from corrupting the other and keeping track of who did what and when.
RCS allows branching of versions. This gives users the capability to perform parallel development of several different variants of the same file. For example, while working on the version 2.0 release users can produce a maintenance update to the 1.0 version by modifying several source files from the 1.0 release. RCS stores files as a series of revisions that evolved from each other, as a tree. Each node in the tree represents a revision of the same file.
The first version of a file creates a root. All subsequent revisions form a trunk. A branch occurs when a single revision forms more than two revisions. The latest version of both branches and trunk is called a head. Branches are used to maintain parallel versions of the same file. These versions may be then merged together.
RCS can automatically update information, including program name, author name, revision number, creation time and date within a program itself. A programmer does not have to worry about updating comments in a program since RCS does it automatically. This is arguably one of the best features of RCS.
Please note that source control systems are good tools for controlling all sorts of files, not just source code. Script, flat ASCII files, logs or output files can be controlled by using RCS as well.
THE FLOW OF A PROGRAM USING THE RCS SYSTEM
t10_25.sas Version 1.21.2
t10_25.sas Version 1.1
t10_25.sas Version 1.1
Read-Only
RCS Library
First Draft
Check -In
Check -Out
Modify Program
Check Back In
Put into Production
Production Version
Figure 1. The flow of a program using the RCS system
A draft program is being developed in a programmer's user or test area. After the program is tested, it is checked into the RCS library. The program header should contain RCS keywords so that it can be updated by RCS. Another user, who has write access into the production area, puts the program into production by checking out the program into a directory where all of production code is stored. If this program requires modification it is checked out for editing. The programmer checks the program
back into the RCS library after all necessary changes are made.
Therefore the RCS library functions as an extra protection layer between production and user areas.
USING RCS
One of the reasons why these utilities are so useful is their simplicity. Most of the work is done through few easy-to- remember commands. Please look through the following example to see how to use RCS.
First, in order for RCS to work there must exist a RCS directory.
It can be created in the same directory with program files or it may be located elsewhere.
Make RCS directory :
mkdir RCS
Create a sample program file called testrcs.sas with a header containing some of the keywords:
/*******************************************
Program: $Source$
Version: $Revision$
Date: $Date$
Programmer: $Author$
---DO NOT MODIFY ABOVE THIS LINE --- Project: Ax14723
Platform: UNIX
Purpose:
Print all patients in dataset demog Requirements: N/A
Parameters: N/A Macros: N/A Formats: N/A Input: N/A Output: N/A Assumptions: N/A Invocation: N/A
---DO NOT MODIFY BELOW THIS LINE --- Modification History:
$Log$
*******************************************/
data demog;
input pid $ sex $ age height weight;
cards;
1001 m 14 69 112.5 1002 f 13 56.5 84 1003 f 13 65.3 98 run;
To place testrcs.sas into RCS control type:
ci testrcs.sas
RCS produces a message:
RCS/testrcs.sas,v <-- testrcs.sas
enter description, terminated with single '.' or end of file:
NOTE: This is NOT the log message!
>>
Enter a description of this version, for example, type:
Created dataset demog
To save enter . (period) RCS produces a message:
initial revision: 1.1 done
RCS just stated you that testrcs.sas was placed under RCS control.
Note that the program testrcs.sas is no longer located in the directory. However, in the RCS directory there is a file testrcs.sas,v. By looking at the contents of this file this will not be the original testing file. Instead, the file will contain special codes that RCS uses to keep track of the file. RCS automatically starts version numbering at 1.1, but there are options of using names instead of versions. That's all, the file is now being managed by RCS.
To open testrcs.sas as read - only type the following command:
co -kv testrcs.sas
The -k v option allows RCS to update the header without extraneous characters.
Open the program and notice that information in the header was automatically updated !
To modify the file and create the next version of this program, open the file in edit mode by typing:
co -l testrcs.sas
The -l option locks the file so that only one person can edit it at a time.
RCS will display the following message:
RCS/testrcs.sas,v --> testrcs.sas.c revision 1.1 (locked)
done
Modify the program by adding a simple proc print statement:
proc print data=demog;
run;
Check the program back into the RCS by repeating the ci command:
ci testrcs.sas
RCS produces a message:
RCS/testrcs.sas,v <-- testrcs.sas new revision: 1.2; previous revision: 1.1
enter log message, terminated with single '.' or end of file:
Notice that the version number was incremented to 1.2. If the program was checked out and then checked back in with changes, RCS will make it version 1.3. See the example below.
Check out testrcs.sas again and notice that it's header was updated as follows:
/*******************************************
Program: C:\SUG/RCS/testrcs.sas,v Version: 1.3
Date: 2000/02/22 18:33:51 Programmer: CHERNM02
---DO NOT MODIFY ABOVE THIS LINE --- Project: Ax14723
Platform: UNIX
Purpose:
Print all patients in dataset demog Requirements: N/A
Parameters: N/A Macros: N/A Formats: N/A Input: N/A Output: N/A Assumptions: N/A Invocation: N/A
---DO NOT MODIFY BELOW THIS LINE --- Modification History:
Testrcs.sas,v
Revision 1.3 2000/02/22 18:33:51 CHERNM02 Added title to proc print
Revision 1.2 2000/02/22 18:33:25 CHERNM01 Added proc print
Revision 1.1 2000/02/22 18:32:54 CHERNM01 Initial revision
*******************************************/
One of the most powerful features is the ability to check out any version of a program. This is done by using the -r option. For example, let's assume there are three versions of testrcs.sas and we need to edit the second version. This version is numbered by RCS as version 1.2.
Type the following command to check out an earlier version:
co -l -r1.2 testrcs.sas
This will open the second revision of testrcs.sas in edit mode.
Note that RCS will number the next version of testrcs.sas as version 1.2.1.1 instead of 1.4. All subsequent changes to that version will be branched out. Accordingly, a change to version 1.2 will create a version 1.3 (See figure 2).
Figure 2. Tree of revisions
OTHER USEFUL COMMANDS
Use RCS rlog command to see a document's revision history:
rlog testrcs.sas
It will produce the following output :
RCS file: RCS/testrcs.sas,v Working file: testrcs.sas head: 1.3
branch:
locks: strict access list:
symbolic names:
keyword substitution: kv
total revisions: 3; selected revisions: 3 description:
Initial version
--- revision 1.3
date: 2000/02/22 16:09:31; author:
chernm02; state: Exp; lines: +7 -3 Added title to proc print
--- revision 1.2
date: 2000/02/22 16:08:06; author:
chernm01; state: Exp; lines: +12 -6 Added proc print
--- revision 1.1
date: 2000/02/22 16:07:14; author:
chernm01; state: Exp;
Created dataset demog
RCS can show all files currently being edited by the users:
rlog -L -R RCS/*
This will produce a list of files currently being edited:
RCS/testrcs.sas,v
As you can see, RCS is easy to use, but each of its commands have many different options. On UNIX read the man pages for all of the options.
AUTO-DOCUMENTING HEADERS
No program is complete without proper documentation. Any source program should contain, at a minimum, information about its author, last modification date, program name, and a description of the code and all the changes that have been made to the code. Unfortunately, many programs are not sufficiently documented. Manytimes even if there are comments in a program, these comments may not be up to date. Also, many programmers forget to comment on all changes they make to a program.
RCS may help to ensure the proper documentation of source code. It allows users to insert certain keywords into working files.
These keywords all have special meaning. Each special keyword is inserted into a comment in the original version of a program.
When these keywords are in a file during a check out, a check out command updates the value of each keyword with the information stored in the RCS library. The most useful RCS keywords are:
$Source$
$Author$
$Date$
$Revision$
$Log$
The $Log$ keyword corresponds to the comments which are entered during a version check in. These comments will be then automatically inserted into a program. Therefore, a programmer will no longer have to be concerned with updating the main documentation in the program because RCS will do most of the work. It will automatically update any file information as long as the proper RCS keywords are used within a comment. A programmer will never forget to describe changes to a program because RCS forces a programmer to describe changes to programs every time it is checked in. Needless to say, it is still the responsibility of a programmer to properly document a program.
IMPLEMENTING A VERSION CONTROL SYSTEM IN YOUR COMPANY
It is very easy to start using a version control system in your company. However, consider the following issues before implementing a version control system:
Choice of SCCS vs. RCS.
RCS seems to be a much better utility than SCCS.
Location of the RCS directory.
The directory where RCS files are stored must be easily accessible, readable and writeable by all users. Note that the entire path to RCS files must be explicitly specified if the RCS directory is located in any other place but a subdirectory.
testrcs.sas Version 1.3
Head
testrcs.sas Version 1.2 Branch start
testrcs.sas Version 1.1
Root
testrcs.sas Version 1.2.1.2
Branch head
testrcs.sas Version 1.2.1.1
Branch node
Choice of keywords in the program headers
RCS has various keywords which could be inserted into SAS programs. Not all of them are needed. At least the following keywords should be used: $Source$, $Date$, $Author$ and
$Log$.
Format of standardized headers
Since RCS keywords are going to be a part of a program's header it may be helpful to develop standard headers used by all programmers in the group. It will simplify the development of a consistent source control process.
Development of user-friendly menu systems
RCS is a very powerful and flexible system. It provides users with many different options which may be difficult to remember.
In order to simplify the use of these utilities and minimize a chance of a mistake develop a user-friendly menu system. A menu system may shield users from having to remember various RCS commands. A basic menu system may be easily developed by using UNIX scripts or batch files. There are also exist several commercial GUI versions of RCS.
TRADEMARKS
SAS is a registered trademarks of SAS Institute Inc.
Cary, NC, USA
indicates USA registration.
CONTACT INFORMATION
Terek Peterson, MBA
Clinical SAS Programming Consultant Alliance Consulting Group
(610) 917-5303 (W) E-mail: [email protected] Max Cherny
Clinical SAS Programming Consultant Alliance Consulting Group
(610) 917-6719 (W)
E-mail: [email protected]
REFERENCES:
1. UNIX man pages
2. Robin Burk, David Horvath, "UNIX Unleashed, Internet Edition", Sams, 1997
3. Publishing Sams, "Unix Unleashed" , Sams, 1994