–
A Process Framework
testing of applications, plays a vital role in the IT system of any organization. An increasing number of organizations are requesting “Test Data Management” as a managed, centralized IT service from its vendors, mainly with the objectives of realizing cost benefits, reduced time-to-market and improved quality of the end-product. However, Test Data Management is not as simple as it seems at first glance and comes with its own set of challenges.The intent of this paper is to discuss the challenges and typical practices in providing Test Data Management services, and share a process solution that addresses the challenges
About the Authors
Dinesh NitturDinesh Nittur has 11 years of experience in the IT industry. He holds a Bachelors degree in Mechanical Engineering from Bangalore University. He has worked in various domains like Insurance, banking and capital market, with major focus on QA and Testing.
Tithiparna Sengupta
Tithiparna Sengupta is a Test Strategy Consultant at Tata Consultancy Services Ltd. She has 9 years of experience in the IT industry, with a 6+ years of experience in different areas of software testing. She has worked on testing related engagements for TCS clients in the Financial Services domain. She holds a Bachelors degree in Chemical Engineering from Jadavpur University, Kolkata and a post-graduate degree in the same discipline from Indian Institute of Technology, Kanpur.
Table of Contents
1. Introduction 3
2. Challenges 3
3. Solution 5
Typical Practices 5 On-the-job Observations 7 Process Framework 7
7. Benefits 12
8 References/Future Study 12
Introduction
Challenges
In today’s world of competitive market, organizations change their business trends and strategies frequently to sustain growth. The changes in business trends result in development of new IT applications or changing the existing ones, which then result in various data needs in sub-production environments to facilitate development/enhancement/maintenance of the IT applications and, of course, testing.
Test Data Management thus plays a vital role in the IT system of any organization. An ineffective test data management may lead to:
lInadequate testing and thus poor quality of product
lIncreased time-to-market
lIncreased costs from redundant operations and rework
lNon-compliance with regulatory norms on data confidentiality
In view of the above, an increasing number of organizations are requesting “Test Data Management” as a managed, centralized IT service from its vendors. The major benefits that any organization seeks from such a service are:
lReduced cost in test data management
lImproved data quality, which will contribute to improved testing and thus improved quality of end product
lTimely data delivery, which will contribute to faster test executions and thus reduced time-to-market
lStrict adherence to regulatory norms on data confidentiality (more than a benefit, this is a “must-have”)
lFreeing up of the bandwidth of developers and testers for their “core” work that is, code development and
testing
A well-defined process framework, along with effective use of tools for test data creation and masking, can enable a fast realization of the above benefits.
At first glance, Test Data management may seem simple, but a closer involvement reveals that it is quite a tricky job that comes with the following challenges:
lComplexity of data requirements – each project usually involves multiple application teams requiring data to be
synchronized between applications; each application is simultaneously involved in multiple projects resulting in contention for environments and so on (Ref: Figure 1).
lAnalysis of data requirements due to lack of information on the existing data – in most organizations, the
Production and sub-production environments are not “profiled” and thus gap analysis between “as-is” and “to be” can prove to be a major challenge.
lScope creeps – there are often frequent changes in data requirements due to changes in business requirements
and handling these scope creeps, particularly, the late ones, can be a major challenge.
lSudden and immediate requests for test data during test execution – catering to these types of requests requires
a lot of agility since the allowed turnaround time is very short.
lAdherence to regulatory compliance like data confidentiality – data in the test environments cannot be a direct
copy of production; confidential information must be masked before loading data. Despite using tools for this purpose, the process of data masking still adds some time to the overall cycle time for servicing any request that involves environment refresh with production data.
lAssurance of data safety or security - there was initially no well-defined policy on test data storage strategy with
version-control, access security and back-up mechanisms. Thus, there was always a risk of crisis situations resulting from unanticipated loss of database.
lReliance on production data for loading to test environment - this is always a challenge due to the huge volume
of production data and the chance of disruption to production systems due to repeated data requests.
lCoordination – the test data management team has to coordinate with application teams, infrastructure team,
Data Base Administrators (DBA) and so on. Coordination with multiple stakeholders can be at times quite a challenging task.
lLack of a proper process framework to manage the activities related to Test Data Management
lEnsuring of proper data distribution so as to prevent:
- Data contention between multiple projects - Redundant or unused data in any region
lEnsuring of data reuse
lManaging of the impact of data refresh on ongoing projects
lIdentification of the right region that caters to the needs of all the applications within a project
The figure below is a diagrammatic representation of complexity of data requests spanning multiple projects, applications, and environments:
4
Figure 1: Complexity of Test Data Requests
Dev Env Test Env 1
Test Envn
App 1 App 2
..App n App 1 App 2 ..App n App 1 App 2 ..App n
Prj 1 Prj 2 Prj 1 Prj 3 Business Prj 2 Prj x Prj x Prj n
Test Data
Data Request Data Request Data Request Data Request
Data Request Data Request Data Request Data Request
Solution
Typically, Test Data Management services can be segregated into the following four categories:
1. Initial test data set-up and/or synchronization of test data across applications. This is a one-time job that is executed by the Test Data Management team right after it is established.
2. Servicing data requirements for project(s). Projects are again of two categories:
a. Development of a new application, which may thus require test data creation from scratch b. Enhancement or maintenance of existing application(s) only
3. Regular Maintenance or Support – servicing: a. Simple data requests
b. Change requests (CRs), that is, change in data requirements
c. Problem Reports (PRs), that is, problems reported in data delivered
4. Perfective Maintenance – scheduled maintenance of test beds on an annual frequency
Here we will talk about a process framework that takes care of all the services listed above. The process framework is derived from the typical practices that are followed for test data management and some on-the-job observations. Through this process framework, we have tried to address all the challenges mentioned above.
Typical Practices
These are the typical best practices that are followed in Test Data management:
• As part of initial data set-up and/or data synchronization across applications, a “Data Profiling” exercise is
carried out for all the environments (Production and sub-production). Data Profiling allows us to understand what is in our production data as well as what currently exists in our test data; it is the process of collecting information and documenting the characteristics of the data in terms of data source, data attributes, relationships, values, dependencies and domains. After the first exercise, data profiling exercise is typically repeated as part of scheduled perfective maintenance.
lTest Data requests are captured in a standardized test data request form so as to avoid gaps in the requirements
provided.
lThe most current data profiles are used to analyze the gaps between current environments and data
requirements.
lAll data requests are analyzed for: gaps with current environments and impact on other projects.
lIdeally, project teams should finalize their Test Data requirements and put in their data requests at the end of the
Design phase of the SDLC itself.
lTest Data Manager (TDM) holds calls, meetings and formal reviews of documents to ensure alignment of all
teams on test data requirements and data set-up strategy.
lBoth test data requirements and data set-up strategy are signed off by key stakeholders, which include the
lData refresh should ideally be the last option of the team, resorted to only after exploring possibilities like:
- Use of existing bed
- Sharing of test bed with other project (this minimizes redundant and unused data in any test bed) - Creation of “partial” (only the missing) data in the test bed and so on
lIf going for 'data refresh', instead of refreshing with production data or data slices, test data management teams
often first explore the possibility of refreshing with data dump from other test regions. All applications may or may not be able to access all test regions within test environment. Based on data needs from projects, test data are pulled from different test regions, synchronized and then loaded to the target test region. This practice ensures minimum disruption to production systems due to repeated data requests.
lFor loading production data or data slices to test environment, tools are used to first mask the data prior to
loading. Data are masked both for database tables as well as files.
lFor creating data, it is common to use tools like shell scripts, SQL procedures etc.
lTest data management teams ideally should maintain a Data Distribution Log. This helps to prevent data
contention issues and also to quickly identify the data that is available for distribution from one project to another.
lData Version Control and Data Cataloging - after data set-up, Test data management team should store away the
copy/back-ups of the data (both the files and databases), assign a version number to the data and then make an entry of the version number along with data details in the Data Catalog. This allows the restoration of environment to original condition with minimum effort, as and when required.
lTest data management team should implement a well-documented set of Data Security policies (data version
control, back-up, storage and access policies)
lRegular maintenance of test beds should be carried out on an annual (or any other defined frequency) basis so as
to preserve the synchronization between data of the different applications in the same region and facilitate re-usability of data that have been once set up.
lTest data management team should build standardized templates for project or application teams to raise data
requests as well as CRs and PRs, and also standardized templates for all their deliverable as well as internal-use documents, for example, Data Requirements document, Data Profile document, Analysis document, Data Strategy document, Data Distribution Log and Data Catalog.
lIn many organizations, Data Strategy document is made a part of the Master Test Plan. In such cases, the Master
Test Plan template includes a very detailed section called “Test Data Plan”, with sub-sections on strategy for test data set-up and the plan for various test data set-up activities (list of activities, time lines, estimated efforts and so on), risks and communication plan. The Test Data Management Team, based on the data requirement and subsequent analysis of the same, prepare the Data Plan, which in turn is incorporated into the Master Test Plan and then signed-off. This also helps the Test Manager to review and monitor the test data management activities.
lWhile loading production data to test environment, instead of using the whole data dump, data sampling
technique can be used to derive a production slice. Sampling allows extraction of a small representative set of data from production for use in testing. While adoption of data sampling practice calls for a certain level of technical maturity from the Test Data Management team, the practice enables the teams to work with even smaller volumes of production data without compromising on data requirements coverage.
lInstead of reverting the environment to the previous version of data, tool(s) can be used to restore the used data
to its original “unused” condition. In many organizations, existing GUI-based functional automation scripts are used for this purpose. For instance, “open” accounts that have been “closed” upon test execution, can be “re-opened”, from the GUI, through test automation scripts, to make the account numbers reusable.
On-the-job observations
These are the key on-the-job observations of the practitioners:
lThe key to completing the actual data set-up process is an effective coordination with all the teams involved and
prior communication of all activities
lLack of proper connectivity with test regions often hinders test data set-up and/or subsequent validation of the
set-up. Connectivity with the environment should be verified early on, before proceeding with the actual data set-up.
lAlmost every test region has data redundancy or unused data, and distributing unused data from one project to
another is a very quick and effective way of handling sudden data requests.
Process Framework
We have segregated the Test Data Management practices into various project phases – Planning or Kick-off, Analysis, Design, Build and Maintenance and defined an ETVX model for each of the phases as described in the following table:
Phase Entry criteria Tasks Validation Exit Criteria and Work items
Planning- cum-Kick-off
Test Data Manager (TDM) identified for test
management services
l
Acquire initial understanding of the
test data landscape in the organization through meetings, questionnaire and so on.
lBuild templates for:
1. Data Request form 2. Data Profile document 3. Data Requirements document 4. Analysis Report
lReview Test
Data Landscape document
lReview
Templates
lReview list of
ongoing projects
l Exit Criteria lSigned-off
Templates
lSigned-off Test
Data Landscape document
l Work Items lQuestionnaire lTest Data
Landscape document
8
Phase Entry criteria Tasks Validation Exit Criteria and
Work items Planning- cum-Kick-off Test Data Manager (TDM) identified for test management services
lReview Test Data
Landscape document
lReview
Templates
lReview list of
ongoing projects
l Exit Criteria lSigned-off
Templates
lSigned-off Test
Data Landscape document
l Work Items lQuestionnaire lTest Data
Landscape document
lTemplates
5. Data Strategy Document or Data Plan (can be made a subset of Master Test Plan)
6. Data distribution log (log of request, fulfillment details, region, characteristics of data provided and so on)
7. Data Catalog
lIdentify SPOCS from each application
team, DBAs and so on.
lConduct meetings with SPOCS lPrepare a list of ongoing projects with
start and end dates, applications involved, test regions being used by each project or application and so on.
lPrepare “Data Landscape” document
(a single- stop document containing information like list of test regions, applications, types of data stores (files or database), typical frequency of data requests for each application and so on.
lEstablish SLAs for delivering test data
management services.
lSet up a team for Test Data
Management (can be a virtual team comprising of members from different application teams and DBAs).
Analysis l Templates are
available
l Signed-off
Test Data Landscape document is available
lCarry out Data profiling exercise for
each of the individual data stores across the enterprise, for ALL regions, including Production (optional, applicable only if it is an initial data set-up or synchronization exercise).
lAssign version number to existing data
in all environments and enter them in the Data Catalog (optional, applicable only for initial data set-up or
synchronization; else, this is an activity that is done only during the build phase or maintenance phase, after setting up the test data and validating its correctness).
lCollect Test Data Requirements (can be
for specific project(s) or for initial data set-up and synchronization across applications) through calls, meetings, data request forms and so on.
lConsolidate requirements provided in
lReview Data Profile
document (optional,
applicable only for initial data set set-up or
synchronization)
lReview Data
Catalog (optional, applicable only for initial data set-up or synchronization)
lReview data
requirements (through calls, meetings and formal review of requirements document)
lReview updated list
of projects
l Exit Criteria lReviewed Data
Profile documents (optional, applicable only for initial data set-up or synchronization)
lReviewed Data
Catalog (optional, applicable only for initial data set-up or synchronization)
lSigned-off test
data
Phase Entry criteria Tasks Validation Exit Criteria and Work items Data Request forms in Data
Requirements document.
lUpdate list of ongoing projects. lAnalyze data requirements, latest data
distribution log (if present) and existing test bed to identify the following:
1. Gaps between requirements and current test beds
2. Gaps between requirements and current data profiles (Is the data requirement similar to the production data profile? Or have some very typical data scenarios in production been missed out? Is the data
requirement similar to the data profile of any test environment)
3. Impact of data modification on ongoing projects.
lPrepare Analysis report.
lDefine Data Security policies - version
control, back-up, storage and access policies (one-time activity).
lReview Analysis
document
lReview Data
Security Policy document (one-time)
lReviewed Analysis
document
lSigned-off Data
Security Policy document (one-time only)
l Work Items l Data Request
form(s)
l Data
Requirements document
l Data Profile
document (optional)
l Data Catalogue
(optional)
l Analysis
document
l Data Distribution
Log
l Data Security
Policy document
Design l Signed-off
test data requirements are available. lSigned-off Analysis document is available.
lDecide the strategy for data
preparation:
§Identify test region(s) where data need
to be loaded or refreshed
§Identify methods to be used, for
example,
-
Obtain partial data (only the “missing” data) from other region-
Restore “used” data to original “unused” state-
Refresh with data dumps (production slice or other regions)-
Refresh with old back-ups-
Mask data-
Create new data-
Distribute unused data from other projectsnIdentify data sources and providers nIdentify tools for data extracting,
masking, creating, loading and so on
nIdentify Data distribution plan (e.g.,
tagging account numbers to projects etc.)
nIdentify coordination and
communication plan)
lReview and align
the strategy through calls and meetings
lReview Test Data
Plan document
l Exit Criteria lSigned-off Data
Strategy Document or Data Plan
l Work Items lData
Requirements document
l Data
Distribution Log
l Analysis
document
l Data Strategy
document or Data Plan
10
Phase Entry criteria Tasks Validation Exit Criteria and
Work items
lPlan for test design activities – list of
activities, time lines, efforts, risks, communication plan etc.
lCreate Data Strategy document or
Data Plan (containing both the strategy and planning details)
Build l Signed-off
test data plan is available. Environment
for data set-up is
available and connectivity with them is established and validated. Identified Tools are available.
lExecute activities like the following,
identified in test data plan:
-Take back-up of data from source region (Production or other sub-Production regions)
- Carry out masking (optional)
- Load data dump (masked or unmasked) to target region
- Create Data (manual or using tools) in target region.
• Communicate data readiness, along with data distribution details, and request validation of environment and data readiness.
• Take back-ups (for future reuse) of the new data (both databases and files) once data is set up.
• Assign version number to the back-up and catalog it with proper description. • Update Data Distribution log.
• Update Data Profile document, if required. (Especially applicable for initial data set-up or synchronization, wherein at the end of build phase, gaps between production and test
environments are closed, thus updating the data profile of the test
environments.)
lValidate data
set-up correctness and readiness
lReview updated
data distribution log
lReview Data
version and updated catalog
lReview Data Profile
document (optional, applicable in specific cases only)
l Exit Criteria lSigned-off test
environment lReviewed updated data distribution log lReviewed updated Data catalog
l Reviewed
updated Data Profile
l Work Items lData Plan lData catalog lData
Distribution Log
lData Profile
document (optional, applicable in specific cases only
Maintenance l Simple data
requests/Late change in data requirements (to be captured through a CR)/Sudden or unplanned data requirements during test execution/Sch eduled perfective maintenance
Support change request/unplanned data needs/problem reports or incidents on data delivered:
lCreate or update Data requirements
document
l Assign priority to the request, in case
of multiple requests
lAnalyze requirements to evaluate:
-
If data requirements can be furnishedfrom existing test bed by data reuse, modification and other means
-
If required data can be shared from another project with same data characteristics.lValidate data
set-up correctness and readiness
lReview updated
data distribution log
lReview updated
data profile document
(applicable only for
l Review Data
version and catalog
l Exit Criteria lSigned-off on
the test environment
lUpdated data
distribution log
lReviewed
Data catalog
l Reviewed
updated Data Profile Document (only for perfective maintenance)
Phase Entry criteria Tasks Validation Exit Criteria and Work items
lPerform necessary steps, for example,
-
Identify data that may be reused orshared
-
Modify datalCommunicate data readiness, along
with data distribution details, and request validation of environment and data readiness.
l Take back-up (for future reuse) of new
data once data is fixed or set up.
lAssign version number to the back-up
and catalog it with proper description.
lUpdate Data Distribution log.
lIf necessary (i.e. where “quick” request
resolutions are deemed unfeasible), then redirect request to be serviced through elaborate Analyze->Design->Build phases.
Scheduled perfective maintenance:
lReview status of ongoing projects and
decide, upon analysis, the schedule for maintenance.
lCommunicate schedule of
maintenance to all project or application teams.
lCarry out Data profiling exercise for
each of the individual data stores across the enterprise, for ALL regions, including Production.
lAssess gaps in the test bed and close
them.
lIf required, refresh test region with old
back-up of the test region so as to restore it to its original state where all data are useable as-is and data for all applications are in sync.
lUpdate Data Distribution log with
details of 'refresh' (region(s) refreshed, date of refresh, date of dump with which environment has been refreshed and so on).
l Work Items lData Request
form
lData
Requirements document
lData
Distribution Log
lData Profile
document (only for perfective maintenance)
12
Benefits
References/Future Study
http://www.tcs-trddc.com/TECS'09/masketeer.pdf
Acknowledgments
The deployment of the process framework described above has been seen to yield the following benefits at:
l Reduction in cost to organization by as much as 30 - 40% from:
o Reduced data set-up effort due to streamlined processes
o Reduced rework due to “first-time correct” deliveries
o Increased use of tools for data creation
o Improved data reuse (approximately 80% data reuse) using previous backed-up data versions
o Reduced data redundancy through environment sharing between projects
o Reduced number of environment refreshes due to exploration of other options like data sharing, creating
only the missing data and so on.
l Reduced disruption to Production services due to reduced requests for production data dumps
l Reduced time-to-market – at least, no delays due to delay in test data delivery
l Improved data quality owing to thorough analysis of data requirements and thus improved testing
l Increased data security and recover-ability from well-defined data security policy
[1] We spoke with practitioners across TCS i.e. Test Data Manager and members of multiple teams that are providing Test Data Management services.
TM
[2] Masketeer – TCS tool for test data masking - [3] Testify – TCS tool for test data creation
We acknowledge all the people in TCS, Vinita M and Chaitra Puttaswamy in particular, who helped us write this white paper by providing valuable information and suggestions.
All content / information present here is the exclusive property of Tata Consultancy Services Limited (TCS). The content / information contained here is correct at the time of publishing. No material from here may be copied, modified, reproduced, republished, uploaded, transmitted, posted or distributed in any form without prior written permission from TCS. Unauthorized use of the content / information appearing here may violate copyright, trademark and other applicable laws, and could result in criminal or civil penalties.
Copyright © 2009 Tata Consultancy Services Limited
outsourcing organization that delivers real results to global businesses, ensuring a level of certainty no other firm can match. TCS offers a consulting-led, integrated portfolio of IT and IT-enabled services delivered through its unique Global Network Delivery
TM
Model , recognized as the benchmark of excellence in software development.
A part of the Tata Group, India’s largest industrial conglomerate, TCS has over 143,000 of the world's best trained IT consultants in 42 countries. The company generated consolidated revenues of US $6 billion for fiscal year ended 31 March 2009 and is listed on the National Stock Exchange and Bombay Stock Exchange in India. For more information, visit us at www.tcs.com
Subscribe to TCS White Papers
TCS.com RSS: http://www.tcs.com/rss_feeds/Pages/feed.aspx?f=w Feedburner: http://feeds2.feedburner.com/tcswhitepapers
TC
S
D
e
si
g
n
S
e
rv
ic
e
s
M
0
9
0
9
gained through the numerous mission critical projects that have been successfully delivered to our eminent BFS clients across the Globe. This is strengthened by the BFS Industry Practice, which has professionals who have served across Global, Regional and National Financial Institutions in various lines of BFS business and operations. Our global focus, deep industry knowledge and commitment to understanding and satisfying client needs have been critical to our successes.
The BFS Industry Practice is organised to deliver value to our clients across the multiple BFS-Industry Solution units, Insurance ISU, New Growth Markets and Emerging Market IOUs.