Test Data Management
The Best Practices in TDM
Abhik Kar
Independent Validation Solutions Infosys Technologies Limited
Florida, USA
Debdatta Lahiri
Independent Validation Solutions Infosys Technologies Limited
Chennai, India Abstract—With increasing demand in fulfilling customer
expecta-tions, industries are forced to bring in applications with more fea-tures and functionalities. The integration of isolated and legacy applications have increased across industries and applications are now talking to number of interfaces. Volume of the data flow across applications has increased to humongous levels. Variation in data and interactions with multiple interfaces has caused number of scenarios being highly dependent on the test data. Effective TDM reduces testing cycle time and risk of defect slip, increases the reusability and brings in consistency in the test results. This paper discloses the research results and tries to show the effective ways of managing Test Data in a challenging environment. It un-covers the benefits of best practices in TDM to testers and other various stakeholders involved. It is supported by real life case study which proved effective TDM can reduce efforts in testing by significant margin.
Index Terms —Best Practices, Challenges, Cost Savings, Effort Savings, Test Data Management (TDM), Solution
INTRODUCTION
As time has passed all the IT applications have become in-creasingly complex, voluminous, and difficult to maintain. In any testing project there is a very high volume of data flow across all the applications/ modules that interface with each other. Test data plays an extremely vital role for any project where any form of application is being tested. It be-comes a challenge then to manage such high volumes of data without considerable effort being put into it.
Not only do the testers have to follow test plans, build test scenarios, do test scripting, but along with that one thing of major importance is how effectively and efficiently they manage the test data. If Test Data Management (TDM) is done effectively it helps to reduce considerable rework ef-fort, gives better consistency to the entire process of testing, makes it more planned and organized.
The whole testing process becomes a lot more reliable if one is sure that data is being managed and maintained correctly. Without proper TDM it is very difficult to get the highest level of accuracy in any form of testing and we all know
how critical the testing process is, where nothing can be left to assumption and everything has to be validated with prop-er data.
In any form of project we have seen tasks related to test data preparation and organization take up about 50-60% of the total effort. Reducing this huge amount of effort being spent on creating/generating test data is extremely critical for any project. With proper TDM this can be channelized to some other useful work in the project like execution and ad hoc testing.
In our paper we have tried to delve into the process of effec-tive test data management, which in turn reduces the amount of effort being put into maintaining the data which can be leveraged elsewhere in the project. We have also included a case study to show how in real time we have implemented effective TDM and it has helped us to reduce our Effort and time by a large amount.
METHODOLOGY
Understanding the challenges and constraints behind managing and maintaining high volumes of test data Analysis of the best practices which can be followed to
ensure effective maintenance of test data Limitations of the solutions being suggested
Recommendations on the future scope of improvement in Test Data Management
We shall take live examples from our projects to showcase how effectively Test Data Management is being imple-mented and benefits we are getting from following such best practices.
TESTING LIFECYCLE AND ITS DEPENDANCE ON TEST DATA Let us see what the stages in testing lifecycle are and which all stages deal with TDM.
Test Planning Test Development Test Execution Test Reporting Test Result Analysis Defect Retesting Regression Testing Test Closure
Of the above steps the stages starting test development to Test Closure all deal with TDM. In test development if proper Test data is not available then the entire process of test case creation, scripting, developing scenarios becomes highly unreliable and ineffective. Rework effort goes up leading to increasing schedule creep which in turn increases cost to the client.
In Execution, reporting, analysis of results, defect retesting, regression testing stages also proper accurate data is re-quired to test for the particular functionalities, specifica-tions/requirements. In case that is not available then the en-tire purpose is lost. The results become extremely unreliable and unrealistic.
CHALLENGES FOR EFFECTIVE TDM
Extremely high volumes of data is used for most appli-cations
Different modules of the same applications in the project may require different sets of data
In most of the scenarios a lot of applications interface with each other thus increasing the dependency of all of them on the same sets of data. Also due to this the vo-lume of data to be generated also goes up
Amount of effort required to create data that have to be used across multiple releases and cycles
Centralized database is generally used for many appli-cations. So maintenance becomes another issue. Defect Injection ratio goes up due to manual errors. Faulty Selection of Test Data.
Unreliable test results due to improper TDM.
Defects may be missed out due to unavailability of proper Test data.
Increase in rework effort as a result of lack of proper TDM.
Increase in cost to the client as rework effort increases. Dissatisfied customers/clients.
BEST PRACTICES FOLLOWED IN THE INDUSTRY FOR EFFECTIVE TDM
First step is to thoroughly understand the requirements for generating the data for that.
Always proceed on a step by step basis, as far as testing is concerned.
Always have some back up for the Test data generated. Document the entire process for future referral.
Have internal audits on Test data maintenance process in the project.
SOLUTIONS
A. Selecting the Accurate Test Data with Less Effort is the Sole Objective of Test Data Management
To increase the test efficiency it’s essential to reduce the test effort using new process, tools or techniques. A successful test data management would reap benefit if it’s able to select the right data within the right time with less effort without compromising with the quality. This would in turn enhance the test efficiency.
Test Data Selection in turn Test Data Management is the LEVER of the Testing Life Cycle.
Fig 1: Interrelationship between Test Scope, Test Data and Test Effort
A lever is a simple machine to move a load by turning on a pivot or fulcrum. More the fulcrum is located close to the load; less force is needed to move a heavy load. Similarly the Lever Analogy also fits Data Driven Software Testing. Our research from number of Data Driven Testing has proved the below:
Accurate Test Data Selection is Crucial
More Test Data is close to the real production Data the Test effort required is less
More Test Data is close to the actual Test Scope Test effort needed is less
1) Accurate Test Data Selection is Crucial:
It’s easy to say “Give us the right Test Data and we can test anything.” But, is it easy to select the Right Test Data? "Give me a place to stand, and I shall move the earth with a
lever" - Archimedes
Accuracy of the Test Data is extremely important for a suc-cessful testing. Therefore selecting the right test data with minimum effort is essential. In the Lever analogy the Force Bar is the accuracy measurement of the test data. More ac-curate test data will provide the right position of the fulcrum
which in turn results the force or test effort required. We have included a case study from one of our data driven test-ing to showcase the effectiveness of the accurate test data selection.
2) More Test Data is Close to the Real Production Data the Test Effort Required is Less:
Preparing test data for testing in test environment involves lot of thought process and analysis. The major parameter involved here is analyzing the requirements and understand-ing the functionalities. More we can understand the functio-nalities and create the test data to replicate production sce-nario it increases the effectiveness of the testing. Study has proved more test data is close to the real production data it will reduce the time taken in test cycle and increases the test efficiency.
3) More Test Data is Close to the Actual Test Scope Test Effort Needed is Less:
Test scope includes all the business functionalities and rules need to be tested. As we prepare the test data to ensure max-imum coverage and create test data to cover the test scope this needs a complete tracking. Therefore more the test data is prepared with maximum coverage to the testing scope, more the confidence grows up and the number of testing cycle reduces.
B. Automate the Process of Test Data Generation as Far as Possible to Reduce Human errors
The first step to effective TDM is automating the process of test data creation by using of a proper tool. The tool has to be simple and user friendly. It must reduce the effort taken to create test data from scratch. Also it should be reusable which can be used over releases and across different types of applications. Also it should be easy to understand how exactly to operate the particular tool.
In most of the projects huge volumes of test data need to be generated from the Database. If the entire process is manual then the scope for missing out certain important data or in-jecting defects becomes a lot more probable. To give more reliability and consistency to the test data we should always try and use tools for its creation. This will also bring down the time required to prepare the data and that time can be utilized by the testers for doing some other productive activ-ity like test scripting, test case creation etc.
Example: While our stint in a project with a leading US bank; we automated the entire process of test data selection with the help of a tool called CST (Customer Selection Tool). The Test data selection involved the process of se-lecting customers from a huge database provided to us by the client. The customers were selected based on their zip code preference. Based on their zip codes their credit history
would lie with the concerned US Credit Bureau. As we au-tomated this process of customer selection, the accuracy of this test data went up and the effort required for generating that went down a lot. Further details have been provided in the case study.
C. Create Reusable Data
Test data created should be such that it can be reused over all the releases. This helps in reducing the efforts as well. Enable the reuse of Test Data by the help of Data Ware-housing within the project. Data should be retrievable as and when required, based on the environment to be tested. In a project the testers should always try and create generic test data that can be applied to a number of applications/ releas-es.
Example: During the stint of our project with a leading US Bank we created Test Data that could be reused over the life of 3 successive releases. It reduced the test data preparation time to almost half. Also the consistency of the data also increased.
D. Maintain Version Tracking
Maintain the versions of the test data prepared for the pur-pose of future tracking
E. The Project should always have a Proper Configuration Management Plan
An effective CM plan would ensure that the test data is maintained properly and also the data is maintained accord-ing to the security and confidentiality agreements. It makes sure that data across all the releases are present in the repo-sitory for future referral and reuse. All the data should not be accessible to everyone in the project. Even if they are accessible it should not be editable for those who do not have any use of that data. It helps to prevent unwanted tam-pering with the test data.
BENEFITS OF THE SOLUTION The benefits of TDM include:
Less number of defects are injected Reduction in effort considerably
Improved Defect Detection efficiency due to cor-rectness of data
Better Customer Satisfaction due to dollar savings Reliable and efficient testing process
Better Quality Streamlined efforts.
CASE STUDY
A. Customer Selection Tool
We were involved in a project with a leading US Bank. In the project we had to prepare huge amounts of test data.
1) Problem Statement - Challenges while Preparing the Test Data Manually:
Huge volume of data provided by the client for the purpose of test data preparation.
Data included customers of all 3 Credit Bureaus in US (Trans Union, Experian, and Equifax).
A lot of financial Parameters were specified by the bank for the purpose of generating the custom score for each individual customer from the data-base.
Huge amount of time taken to prepare the test data from the database manually.
As the entire process was manual and a lot of fi-nancial parameters were involved the scope of er-ror also increased.
Defects were injected while preparing the test data due to faulty test data selection.
2) Process of Manually Selecting the Test Data:
A huge database of customers was provided by client
We had to select the test data manually from that database
Customers were selected based on their zip code preference.
Based on their zip code preference their credit his-tory would be present in the concerned US Credit Bureau.
From the Credit Bureau Details a custom score is generated for each customer based on the different financial parameters given by the bank.
Then from the custom score and the FICO score (present in the database) the customers are grouped into different categories starting from Bad Custom-ers ->High risk ->Medium Risk -> Low Risk.
3) Solution:
Due to all the above mentioned challenges and constraints we decided to automate the process of test data selection with the help of a tool called: CUSTOMER SELECTION
TOOL (CST).
a) Features of the CST:
Capable of selecting customers from the Credit Bu-reau File (database provided by client) based on the right Zip Code preference.
Capable of generating the custom score for each of the customers based on the different financial pa-rameters provided by the bank.
Capable of grading the customers into the various risk categories based on criteria as specified by the client.
b) Benefits of using the CST:
Dynamic in customer selection and capable of han-dling large no. of test data.
Data independent and is completely reusable. Eliminates a lot of manual effort in selecting the
test data
Takes into consideration all the financial parame-ters as specified by the bank
Provides the output in our desired form.
Is completely reusable across releases and different projects of the client4) Average Time (in minutes) for Selecting Customer Data
in Each Project:
Fig 2: Time to prepare test data traditionally vs. using CST
5) Savings Quantified:
Test data preparation time measured in terms of per customer data
Traditional Approach (Manual): 12mins on an av-erage
With CST: 2mins on an average
ACKNOWLEDGEMENT
We would like to thank Mrs. Yogita Sachdeva, Senior Project Manager - IVS, Infosys for her help and encourage-ment given to us during the course of drafting this paper. Without her enthusiastic support and encouragement we would not have been able to complete the paper. We would sincerely like to acknowledge her guidance and support giv-en.
Also we would like to thank Mrs. Alice Thankachan, In-fosys for the tremendous level of encouragement that she had given us when we had decided to write this paper. She has helped us in all the possible ways in the course of writ-ing this paper and given us the necessary guidance required. We are sincerely grateful to both of them.
REFERENCES [1] http://www.solix.com/secure_clone.htm [2] http://www.grid-tools.com/ [3] http://www.ibm.com/developerworks/rational/library/06/11 07_davis/ [4] http://www.geekinterview.com BIOGRAPHIES
Abhik Kar is a Process & Domain
Consultant at Infosys Technologies Li-mited. He has led many large Testing projects, especially in Banking Sector. His major areas of interest include Test Process improvement and Test Man-agement.
Debdatta Lahiri is a Process & Domain
Consultant at Infosys Technologies Li-mited. She has been associated with test-ing projects, especially in Banktest-ing Sec-tor, working with leading banks in the industry. Her interest areas mainly in-clude social networking and communica-tion.