CROS NT srl
Contract Research Organisation
Clinical Data Management
Statistics
Dr. Paolo Morelli, CEO
Dr. Luca Girardello, SAS programmer
Best Practice in SAS
programs validation. A Case
Study
AGENDA
AGENDA
Introduction
Program Verification: a Business Approach
FACTS
FACTS
about
about
CROS NT
CROS NT
• Headquarters in Verona (Italy) • Founded in 1993
• Offices in Milan and Munich • 40 employees
• Data Management, Statistical, PhV and hosting services
• Services to Pharma, Biotech and CROs
Introduction
Introduction
•
Topic of the presentation: how to maximize the quality of
programming while minimizing the time to verify program.
•
In the first part of the presentation we will discuss about the
business part:
What is program verification?
Why program verification is necessary?
When is program verification done?
Who performs program verification?
How does the verification process work?
•
In the second part of the presentation we will discuss about a
What is program verification
What is program verification
•
Making certain that the program does what it is
supposed to do, producing a documented evidence
of this
Why program verification is necessary
Why program verification is necessary
•
The aim of SAS validation in pharmaceutical research area
is that end-users will produce high quality programs that fit
the purpose for which they are designed and provide
accurate results with a style that they promote:
•
Reliabity
•
Efficiency
•
Portability
•
Flexibility
When is program verification done
When is program verification done
•
Program verification should performed as soon after the
development of the SAS code, before putting the “product” in
production
•
Development and production environment should be clearly
defined;
•
Audit trail of program changes should be present as soon
Who performs program verification
Who performs program verification
The SAS programmer who create the code should perform basic testing and follow coding rules, like:
• Error log search
• Warning evaluation
• Comments on critical steps
• Comments on Macro usage
• Details of the SAS program (datetime of creation, SAS programmer name, dataset used, datetime of verification, Name of second SAS programmer, etc)
•
It should be emphasized to perform then a program
verification by a second SAS programmer
How does the verification process work
How does the verification process work
Biostatistician creates specs then
Submits request
SAS developer produces TLGs
Then submits verification request
Quality Control programmer verifies results
Interactive Process
Different Verification Procedures
Different Verification Procedures
•SOP should define different verification procedures.
üIndependent programming
üReviewing results
üRandom review of results
üVisually verify code
•Some of them should mandatory, other optional.
•The Document Containing the programming specs (for example the SAP) should define which approach to follow, illustrating program verification techniques (for example using alternative SAS programming procedures)
•The determination of the level of validation should follow a risk-based model. The key is to determine the effect on the process if the program does not produce the desired result.
Error Types
Error Types
•
Business strategy should identify common ‘error types’ found in:
ü
Statistical tables
ü
Listings
ü
Graphs
ü
Data analysis files
ü
Header section of SAS programs
ü
Bad programming specifications
•
Metric report related to error type should be analyzed in order to
Specific CDISC SDTM Validation specs
Specific CDISC SDTM Validation specs
–
–
Metadata Level
Metadata Level
•Verifies that all required variables are present in the dataset
•Reports as an error any variables in the dataset that are not defined in the domain
•Reports a warning for any expected domain variables which are not in the dataset
Specific CDISC SDTM Validation specs
Specific CDISC SDTM Validation specs
-
-Metadata Level
Metadata Level
•Notes any permitted domain variables which are not in the dataset
•Verifies that all domain variables are of the expected data type and proper length
•Detects any domain variables which are assigned a controlled terminology specification by the domain and do not have a format assigned to them
SAS Programming Rules when
SAS Programming Rules when
validating
validating
Ø
Emphasizing well commented programs.
Ø
Macro in order to use programs repeatedly to verify different
programs (re-usability)
Ø
Using alternative SAS programming procedures when
validating.
How to optimize the process
How to optimize the process
Good specs & Good standards & Good training =
A Case
Example
Example
of
of
Derived
Derived
Datasets
Datasets
Validation
Validation
(1/4)
(1/4)
PROC COMPARE
Compare
original derived datasets
versus
validation derived datasets
“Second Programmer” programs all derived datasets
“First Programmer” programs all derived datasets
Example
Example
of
of
Derived
Derived
Datasets
Datasets
Validation
Validation
(2/4)
(2/4)
The COMPARE Procedure
Comparison of WORK.LISTING with WORK.VALIDATION (Method=EXACT)
Observation Summary Observation Base Compare ID First Obs 1 1 pt=121
First Unequal 79 79 pt=201 Last Unequal 79 79 pt=201
Last Obs 89 89 pt=212 Number of Observations in Common: 89.
Total Number of Observations Read from WORK.LISTING: 89. Total Number of Observations Read from WORK.VALIDATION: 89. Number of Observations with Some Compared Variables Unequal: 1.
Number of Observations with All Compared Variables Equal: 88.
proc compare base=listing compare=validation
listbase listcomp;
id pt;
Values Comparison Summary
Number of Variables Compared with All Observations Equal: 3. Number of Variables Compared with Some Observations Unequal: 1. Total Number of Values which Compare Unequal: 1.
Maximum Difference: 1.
Variables with Unequal Values
Variable Type Len Label Ndif MaxDif age NUM 8 AGE (years) 1 1.000
Value Comparison Results for Variables
_________________________________________________________ || AGE (years)
|| Base Compare
pt || age age Diff. % Diff _______ || _________ _________ _________ _________ ||
201 || 41 40 -1.0000 -2.4390
_________________________________________________________
Example
Example
of
of
Derived
Derived
Datasets
Datasets
Validation
The COMPARE Procedure
Comparison of WORK.LISTING with WORK.VALIDATION (Method=EXACT)
Observation Summary Observation Base Compare ID First Obs 1 1 pt=121 Last Obs 89 89 pt=212 Number of Observations in Common: 89.
Total Number of Observations Read from WORK.LISTING: 89. Total Number of Observations Read from WORK.VALIDATION: 89. Number of Observations with Some Compared Variables Unequal: 0. Number of Observations with All Compared Variables Equal: 89.
NOTE: No unequal values were found. All values compared are exactly equal.
Example
Example
of
of
Derived
Derived
Datasets
Datasets
Validation
Example
Example
of
of
Tables
Tables
Validation
Validation
(1/3)
(1/3)
“First Programmer” programs all tables applying the set of
layout specifications and saves outputs in Word
“Second Programmer” programs all tables avoiding to add additional SAS code to control
output
Example
Example
of
of
Tables
Tables
Validation
Validation
(2/3)
(2/3)
________________________________________________________________ Tmt A Tmt B ________________________________________________________________ Age (years) n 41 48 Mean (SD) 51.44 (10.39) 52.10 (11.00) Median 55.00 55.00 Min - Max 30.00- 66.00 27.00- 71.00 Gender Female 14 (34.15%) 21 (43.75%) Male 27 (65.85%) 27 (56.25%) ________________________________________________________________ First Programmer -Output in Word Second programmer -Output SASproc means data=demog n mean stddev median min max;
var age;
by tmt;
________________________________________________________________ Tmt A Tmt B ________________________________________________________________ Age (years) n 41 48 Mean (SD) 51.44 (10.39) 52.10 (11.00) Median 55.00 55.00 Min - Max 30.00- 66.00 27.00- 71.00 Gender Female 14 (34.15%) 21 (43.75%) Male 27 (65.85%) 27 (56.25%) ________________________________________________________________ First Programmer -Output in Word Second programmer -Output SAS
proc freq data=demog;
tables gender*tmt;
run;
Example
Example
Example
of
of
Listings
Listings
Validation
Validation
(1/2)
(1/2)
“Second Programmer” prints derived datasets in SAS “First Programmer” programs
all listings applying the set of layout specifications and
saves outputs in Word
Compare
listing output in Word
versus
output in SAS of derived dataset
Example
Example
of
of
Listings
Listings
Validation
Validation
(2/2)
(2/2)
Listing 1 Demographic Characteristics Subject ID Gender Age Race _______________ _______ ____ _____ 121 M 50 3 122 M 34 3 123 F 58 3 124 M 64 3 125 M 57 3 126 F 64 3 127 M 39 3 128 M 55 2 129 M 41 3 130 M 44 3 131 M 32 3 132 M 37 3 133 M 61 3 134 F 56 3 135 M 34 3 136 M 34 3
Example
Programming 41% Specification 14% Layout 45%
Metrics
Metrics
on
on
Programming
Programming
Errors
Errors
Selection of Variables 14% Calculation of variables 20% SAS Programming 66% Specification not detailed 40% Wrong interpretation of specification 60% Output Writing 56% Output Structure 30% Display Variables 14%
Examples
Examples
of
of
Errors
Errors
•
Layout
Writing of a note in table
Incorrect: “Percentages are calculated number of patients”
Correct: “Percentages are calculated on number of patients”
Examples
Examples
of
of
Errors
Errors
data age;
set demog;
if age<20 then age_c=1;
else if 20<age<40 then age_c=2;
else if age>=40 then age_c=3;
run;
•
Programming
data age;
set demog;
if age<20 then age_c=1;
else if 20<=age<40 then age_c=2;
else if age>=40 then age_c=3;