• No results found

Test Data Management. A Process Framework

N/A
N/A
Protected

Academic year: 2021

Share "Test Data Management. A Process Framework"

Copied!
14
0
0

Loading.... (view fulltext now)

Full text

(1)

A Process Framework

testing of applications, plays a vital role in the IT system of any organization. An increasing number of organizations are requesting “Test Data Management” as a managed, centralized IT service from its vendors, mainly with the objectives of realizing cost benefits, reduced time-to-market and improved quality of the end-product. However, Test Data Management is not as simple as it seems at first glance and comes with its own set of challenges.

The intent of this paper is to discuss the challenges and typical practices in providing Test Data Management services, and share a process solution that addresses the challenges

(2)

About the Authors

Dinesh Nittur

Dinesh Nittur has 11 years of experience in the IT industry. He holds a Bachelors degree in Mechanical Engineering from Bangalore University. He has worked in various domains like Insurance, banking and capital market, with major focus on QA and Testing.

Tithiparna Sengupta

Tithiparna Sengupta is a Test Strategy Consultant at Tata Consultancy Services Ltd. She has 9 years of experience in the IT industry, with a 6+ years of experience in different areas of software testing. She has worked on testing related engagements for TCS clients in the Financial Services domain. She holds a Bachelors degree in Chemical Engineering from Jadavpur University, Kolkata and a post-graduate degree in the same discipline from Indian Institute of Technology, Kanpur.

(3)

Table of Contents

1. Introduction 3

2. Challenges 3

3. Solution 5

Typical Practices 5 On-the-job Observations 7 Process Framework 7

7. Benefits 12

8 References/Future Study 12

(4)

Introduction

Challenges

In today’s world of competitive market, organizations change their business trends and strategies frequently to sustain growth. The changes in business trends result in development of new IT applications or changing the existing ones, which then result in various data needs in sub-production environments to facilitate development/enhancement/maintenance of the IT applications and, of course, testing.

Test Data Management thus plays a vital role in the IT system of any organization. An ineffective test data management may lead to:

lInadequate testing and thus poor quality of product

lIncreased time-to-market

lIncreased costs from redundant operations and rework

lNon-compliance with regulatory norms on data confidentiality

In view of the above, an increasing number of organizations are requesting “Test Data Management” as a managed, centralized IT service from its vendors. The major benefits that any organization seeks from such a service are:

lReduced cost in test data management

lImproved data quality, which will contribute to improved testing and thus improved quality of end product

lTimely data delivery, which will contribute to faster test executions and thus reduced time-to-market

lStrict adherence to regulatory norms on data confidentiality (more than a benefit, this is a “must-have”)

lFreeing up of the bandwidth of developers and testers for their “core” work that is, code development and

testing

A well-defined process framework, along with effective use of tools for test data creation and masking, can enable a fast realization of the above benefits.

At first glance, Test Data management may seem simple, but a closer involvement reveals that it is quite a tricky job that comes with the following challenges:

lComplexity of data requirements – each project usually involves multiple application teams requiring data to be

synchronized between applications; each application is simultaneously involved in multiple projects resulting in contention for environments and so on (Ref: Figure 1).

lAnalysis of data requirements due to lack of information on the existing data – in most organizations, the

Production and sub-production environments are not “profiled” and thus gap analysis between “as-is” and “to be” can prove to be a major challenge.

lScope creeps – there are often frequent changes in data requirements due to changes in business requirements

and handling these scope creeps, particularly, the late ones, can be a major challenge.

lSudden and immediate requests for test data during test execution – catering to these types of requests requires

a lot of agility since the allowed turnaround time is very short.

lAdherence to regulatory compliance like data confidentiality – data in the test environments cannot be a direct

copy of production; confidential information must be masked before loading data. Despite using tools for this purpose, the process of data masking still adds some time to the overall cycle time for servicing any request that involves environment refresh with production data.

(5)

lAssurance of data safety or security - there was initially no well-defined policy on test data storage strategy with

version-control, access security and back-up mechanisms. Thus, there was always a risk of crisis situations resulting from unanticipated loss of database.

lReliance on production data for loading to test environment - this is always a challenge due to the huge volume

of production data and the chance of disruption to production systems due to repeated data requests.

lCoordination – the test data management team has to coordinate with application teams, infrastructure team,

Data Base Administrators (DBA) and so on. Coordination with multiple stakeholders can be at times quite a challenging task.

lLack of a proper process framework to manage the activities related to Test Data Management

lEnsuring of proper data distribution so as to prevent:

- Data contention between multiple projects - Redundant or unused data in any region

lEnsuring of data reuse

lManaging of the impact of data refresh on ongoing projects

lIdentification of the right region that caters to the needs of all the applications within a project

The figure below is a diagrammatic representation of complexity of data requests spanning multiple projects, applications, and environments:

4

Figure 1: Complexity of Test Data Requests

Dev Env Test Env 1

Test Envn

App 1 App 2

..App n App 1 App 2 ..App n App 1 App 2 ..App n

Prj 1 Prj 2 Prj 1 Prj 3 Business Prj 2 Prj x Prj x Prj n

Test Data

Data Request Data Request Data Request Data Request

Data Request Data Request Data Request Data Request

(6)

Solution

Typically, Test Data Management services can be segregated into the following four categories:

1. Initial test data set-up and/or synchronization of test data across applications. This is a one-time job that is executed by the Test Data Management team right after it is established.

2. Servicing data requirements for project(s). Projects are again of two categories:

a. Development of a new application, which may thus require test data creation from scratch b. Enhancement or maintenance of existing application(s) only

3. Regular Maintenance or Support – servicing: a. Simple data requests

b. Change requests (CRs), that is, change in data requirements

c. Problem Reports (PRs), that is, problems reported in data delivered

4. Perfective Maintenance – scheduled maintenance of test beds on an annual frequency

Here we will talk about a process framework that takes care of all the services listed above. The process framework is derived from the typical practices that are followed for test data management and some on-the-job observations. Through this process framework, we have tried to address all the challenges mentioned above.

Typical Practices

These are the typical best practices that are followed in Test Data management:

• As part of initial data set-up and/or data synchronization across applications, a “Data Profiling” exercise is

carried out for all the environments (Production and sub-production). Data Profiling allows us to understand what is in our production data as well as what currently exists in our test data; it is the process of collecting information and documenting the characteristics of the data in terms of data source, data attributes, relationships, values, dependencies and domains. After the first exercise, data profiling exercise is typically repeated as part of scheduled perfective maintenance.

lTest Data requests are captured in a standardized test data request form so as to avoid gaps in the requirements

provided.

lThe most current data profiles are used to analyze the gaps between current environments and data

requirements.

lAll data requests are analyzed for: gaps with current environments and impact on other projects.

lIdeally, project teams should finalize their Test Data requirements and put in their data requests at the end of the

Design phase of the SDLC itself.

lTest Data Manager (TDM) holds calls, meetings and formal reviews of documents to ensure alignment of all

teams on test data requirements and data set-up strategy.

lBoth test data requirements and data set-up strategy are signed off by key stakeholders, which include the

(7)

lData refresh should ideally be the last option of the team, resorted to only after exploring possibilities like:

- Use of existing bed

- Sharing of test bed with other project (this minimizes redundant and unused data in any test bed) - Creation of “partial” (only the missing) data in the test bed and so on

lIf going for 'data refresh', instead of refreshing with production data or data slices, test data management teams

often first explore the possibility of refreshing with data dump from other test regions. All applications may or may not be able to access all test regions within test environment. Based on data needs from projects, test data are pulled from different test regions, synchronized and then loaded to the target test region. This practice ensures minimum disruption to production systems due to repeated data requests.

lFor loading production data or data slices to test environment, tools are used to first mask the data prior to

loading. Data are masked both for database tables as well as files.

lFor creating data, it is common to use tools like shell scripts, SQL procedures etc.

lTest data management teams ideally should maintain a Data Distribution Log. This helps to prevent data

contention issues and also to quickly identify the data that is available for distribution from one project to another.

lData Version Control and Data Cataloging - after data set-up, Test data management team should store away the

copy/back-ups of the data (both the files and databases), assign a version number to the data and then make an entry of the version number along with data details in the Data Catalog. This allows the restoration of environment to original condition with minimum effort, as and when required.

lTest data management team should implement a well-documented set of Data Security policies (data version

control, back-up, storage and access policies)

lRegular maintenance of test beds should be carried out on an annual (or any other defined frequency) basis so as

to preserve the synchronization between data of the different applications in the same region and facilitate re-usability of data that have been once set up.

lTest data management team should build standardized templates for project or application teams to raise data

requests as well as CRs and PRs, and also standardized templates for all their deliverable as well as internal-use documents, for example, Data Requirements document, Data Profile document, Analysis document, Data Strategy document, Data Distribution Log and Data Catalog.

lIn many organizations, Data Strategy document is made a part of the Master Test Plan. In such cases, the Master

Test Plan template includes a very detailed section called “Test Data Plan”, with sub-sections on strategy for test data set-up and the plan for various test data set-up activities (list of activities, time lines, estimated efforts and so on), risks and communication plan. The Test Data Management Team, based on the data requirement and subsequent analysis of the same, prepare the Data Plan, which in turn is incorporated into the Master Test Plan and then signed-off. This also helps the Test Manager to review and monitor the test data management activities.

(8)

lWhile loading production data to test environment, instead of using the whole data dump, data sampling

technique can be used to derive a production slice. Sampling allows extraction of a small representative set of data from production for use in testing. While adoption of data sampling practice calls for a certain level of technical maturity from the Test Data Management team, the practice enables the teams to work with even smaller volumes of production data without compromising on data requirements coverage.

lInstead of reverting the environment to the previous version of data, tool(s) can be used to restore the used data

to its original “unused” condition. In many organizations, existing GUI-based functional automation scripts are used for this purpose. For instance, “open” accounts that have been “closed” upon test execution, can be “re-opened”, from the GUI, through test automation scripts, to make the account numbers reusable.

On-the-job observations

These are the key on-the-job observations of the practitioners:

lThe key to completing the actual data set-up process is an effective coordination with all the teams involved and

prior communication of all activities

lLack of proper connectivity with test regions often hinders test data set-up and/or subsequent validation of the

set-up. Connectivity with the environment should be verified early on, before proceeding with the actual data set-up.

lAlmost every test region has data redundancy or unused data, and distributing unused data from one project to

another is a very quick and effective way of handling sudden data requests.

Process Framework

We have segregated the Test Data Management practices into various project phases – Planning or Kick-off, Analysis, Design, Build and Maintenance and defined an ETVX model for each of the phases as described in the following table:

Phase Entry criteria Tasks Validation Exit Criteria and Work items

Planning- cum-Kick-off

Test Data Manager (TDM) identified for test

management services

l

Acquire initial understanding of the

test data landscape in the organization through meetings, questionnaire and so on.

lBuild templates for:

1. Data Request form 2. Data Profile document 3. Data Requirements document 4. Analysis Report

lReview Test

Data Landscape document

lReview

Templates

lReview list of

ongoing projects

l Exit Criteria lSigned-off

Templates

lSigned-off Test

Data Landscape document

l Work Items lQuestionnaire lTest Data

Landscape document

(9)

8

Phase Entry criteria Tasks Validation Exit Criteria and

Work items Planning- cum-Kick-off Test Data Manager (TDM) identified for test management services

lReview Test Data

Landscape document

lReview

Templates

lReview list of

ongoing projects

l Exit Criteria lSigned-off

Templates

lSigned-off Test

Data Landscape document

l Work Items lQuestionnaire lTest Data

Landscape document

lTemplates

5. Data Strategy Document or Data Plan (can be made a subset of Master Test Plan)

6. Data distribution log (log of request, fulfillment details, region, characteristics of data provided and so on)

7. Data Catalog

lIdentify SPOCS from each application

team, DBAs and so on.

lConduct meetings with SPOCS lPrepare a list of ongoing projects with

start and end dates, applications involved, test regions being used by each project or application and so on.

lPrepare “Data Landscape” document

(a single- stop document containing information like list of test regions, applications, types of data stores (files or database), typical frequency of data requests for each application and so on.

lEstablish SLAs for delivering test data

management services.

lSet up a team for Test Data

Management (can be a virtual team comprising of members from different application teams and DBAs).

Analysis l Templates are

available

l Signed-off

Test Data Landscape document is available

lCarry out Data profiling exercise for

each of the individual data stores across the enterprise, for ALL regions, including Production (optional, applicable only if it is an initial data set-up or synchronization exercise).

lAssign version number to existing data

in all environments and enter them in the Data Catalog (optional, applicable only for initial data set-up or

synchronization; else, this is an activity that is done only during the build phase or maintenance phase, after setting up the test data and validating its correctness).

lCollect Test Data Requirements (can be

for specific project(s) or for initial data set-up and synchronization across applications) through calls, meetings, data request forms and so on.

lConsolidate requirements provided in

lReview Data Profile

document (optional,

applicable only for initial data set set-up or

synchronization)

lReview Data

Catalog (optional, applicable only for initial data set-up or synchronization)

lReview data

requirements (through calls, meetings and formal review of requirements document)

lReview updated list

of projects

l Exit Criteria lReviewed Data

Profile documents (optional, applicable only for initial data set-up or synchronization)

lReviewed Data

Catalog (optional, applicable only for initial data set-up or synchronization)

lSigned-off test

data

(10)

Phase Entry criteria Tasks Validation Exit Criteria and Work items Data Request forms in Data

Requirements document.

lUpdate list of ongoing projects. lAnalyze data requirements, latest data

distribution log (if present) and existing test bed to identify the following:

1. Gaps between requirements and current test beds

2. Gaps between requirements and current data profiles (Is the data requirement similar to the production data profile? Or have some very typical data scenarios in production been missed out? Is the data

requirement similar to the data profile of any test environment)

3. Impact of data modification on ongoing projects.

lPrepare Analysis report.

lDefine Data Security policies - version

control, back-up, storage and access policies (one-time activity).

lReview Analysis

document

lReview Data

Security Policy document (one-time)

lReviewed Analysis

document

lSigned-off Data

Security Policy document (one-time only)

l Work Items l Data Request

form(s)

l Data

Requirements document

l Data Profile

document (optional)

l Data Catalogue

(optional)

l Analysis

document

l Data Distribution

Log

l Data Security

Policy document

Design l Signed-off

test data requirements are available. lSigned-off Analysis document is available.

lDecide the strategy for data

preparation:

§Identify test region(s) where data need

to be loaded or refreshed

§Identify methods to be used, for

example,

-

Obtain partial data (only the “missing” data) from other region

-

Restore “used” data to original “unused” state

-

Refresh with data dumps (production slice or other regions)

-

Refresh with old back-ups

-

Mask data

-

Create new data

-

Distribute unused data from other projects

nIdentify data sources and providers nIdentify tools for data extracting,

masking, creating, loading and so on

nIdentify Data distribution plan (e.g.,

tagging account numbers to projects etc.)

nIdentify coordination and

communication plan)

lReview and align

the strategy through calls and meetings

lReview Test Data

Plan document

l Exit Criteria lSigned-off Data

Strategy Document or Data Plan

l Work Items lData

Requirements document

l Data

Distribution Log

l Analysis

document

l Data Strategy

document or Data Plan

(11)

10

Phase Entry criteria Tasks Validation Exit Criteria and

Work items

lPlan for test design activities – list of

activities, time lines, efforts, risks, communication plan etc.

lCreate Data Strategy document or

Data Plan (containing both the strategy and planning details)

Build l Signed-off

test data plan is available. Environment

for data set-up is

available and connectivity with them is established and validated. Identified Tools are available.

lExecute activities like the following,

identified in test data plan:

-Take back-up of data from source region (Production or other sub-Production regions)

- Carry out masking (optional)

- Load data dump (masked or unmasked) to target region

- Create Data (manual or using tools) in target region.

• Communicate data readiness, along with data distribution details, and request validation of environment and data readiness.

• Take back-ups (for future reuse) of the new data (both databases and files) once data is set up.

• Assign version number to the back-up and catalog it with proper description. • Update Data Distribution log.

• Update Data Profile document, if required. (Especially applicable for initial data set-up or synchronization, wherein at the end of build phase, gaps between production and test

environments are closed, thus updating the data profile of the test

environments.)

lValidate data

set-up correctness and readiness

lReview updated

data distribution log

lReview Data

version and updated catalog

lReview Data Profile

document (optional, applicable in specific cases only)

l Exit Criteria lSigned-off test

environment lReviewed updated data distribution log lReviewed updated Data catalog

l Reviewed

updated Data Profile

l Work Items lData Plan lData catalog lData

Distribution Log

lData Profile

document (optional, applicable in specific cases only

Maintenance l Simple data

requests/Late change in data requirements (to be captured through a CR)/Sudden or unplanned data requirements during test execution/Sch eduled perfective maintenance

Support change request/unplanned data needs/problem reports or incidents on data delivered:

lCreate or update Data requirements

document

l Assign priority to the request, in case

of multiple requests

lAnalyze requirements to evaluate:

-

If data requirements can be furnished

from existing test bed by data reuse, modification and other means

-

If required data can be shared from another project with same data characteristics.

lValidate data

set-up correctness and readiness

lReview updated

data distribution log

lReview updated

data profile document

(applicable only for

l Review Data

version and catalog

l Exit Criteria lSigned-off on

the test environment

lUpdated data

distribution log

lReviewed

Data catalog

l Reviewed

updated Data Profile Document (only for perfective maintenance)

(12)

Phase Entry criteria Tasks Validation Exit Criteria and Work items

lPerform necessary steps, for example,

-

Identify data that may be reused or

shared

-

Modify data

lCommunicate data readiness, along

with data distribution details, and request validation of environment and data readiness.

l Take back-up (for future reuse) of new

data once data is fixed or set up.

lAssign version number to the back-up

and catalog it with proper description.

lUpdate Data Distribution log.

lIf necessary (i.e. where “quick” request

resolutions are deemed unfeasible), then redirect request to be serviced through elaborate Analyze->Design->Build phases.

Scheduled perfective maintenance:

lReview status of ongoing projects and

decide, upon analysis, the schedule for maintenance.

lCommunicate schedule of

maintenance to all project or application teams.

lCarry out Data profiling exercise for

each of the individual data stores across the enterprise, for ALL regions, including Production.

lAssess gaps in the test bed and close

them.

lIf required, refresh test region with old

back-up of the test region so as to restore it to its original state where all data are useable as-is and data for all applications are in sync.

lUpdate Data Distribution log with

details of 'refresh' (region(s) refreshed, date of refresh, date of dump with which environment has been refreshed and so on).

l Work Items lData Request

form

lData

Requirements document

lData

Distribution Log

lData Profile

document (only for perfective maintenance)

(13)

12

Benefits

References/Future Study

http://www.tcs-trddc.com/TECS'09/masketeer.pdf

Acknowledgments

The deployment of the process framework described above has been seen to yield the following benefits at:

l Reduction in cost to organization by as much as 30 - 40% from:

o Reduced data set-up effort due to streamlined processes

o Reduced rework due to “first-time correct” deliveries

o Increased use of tools for data creation

o Improved data reuse (approximately 80% data reuse) using previous backed-up data versions

o Reduced data redundancy through environment sharing between projects

o Reduced number of environment refreshes due to exploration of other options like data sharing, creating

only the missing data and so on.

l Reduced disruption to Production services due to reduced requests for production data dumps

l Reduced time-to-market – at least, no delays due to delay in test data delivery

l Improved data quality owing to thorough analysis of data requirements and thus improved testing

l Increased data security and recover-ability from well-defined data security policy

[1] We spoke with practitioners across TCS i.e. Test Data Manager and members of multiple teams that are providing Test Data Management services.

TM

[2] Masketeer – TCS tool for test data masking - [3] Testify – TCS tool for test data creation

We acknowledge all the people in TCS, Vinita M and Chaitra Puttaswamy in particular, who helped us write this white paper by providing valuable information and suggestions.

(14)

All content / information present here is the exclusive property of Tata Consultancy Services Limited (TCS). The content / information contained here is correct at the time of publishing. No material from here may be copied, modified, reproduced, republished, uploaded, transmitted, posted or distributed in any form without prior written permission from TCS. Unauthorized use of the content / information appearing here may violate copyright, trademark and other applicable laws, and could result in criminal or civil penalties.

Copyright © 2009 Tata Consultancy Services Limited

outsourcing organization that delivers real results to global businesses, ensuring a level of certainty no other firm can match. TCS offers a consulting-led, integrated portfolio of IT and IT-enabled services delivered through its unique Global Network Delivery

TM

Model , recognized as the benchmark of excellence in software development.

A part of the Tata Group, India’s largest industrial conglomerate, TCS has over 143,000 of the world's best trained IT consultants in 42 countries. The company generated consolidated revenues of US $6 billion for fiscal year ended 31 March 2009 and is listed on the National Stock Exchange and Bombay Stock Exchange in India. For more information, visit us at www.tcs.com

[email protected]

Subscribe to TCS White Papers

TCS.com RSS: http://www.tcs.com/rss_feeds/Pages/feed.aspx?f=w Feedburner: http://feeds2.feedburner.com/tcswhitepapers

TC

S

D

e

si

g

n

S

e

rv

ic

e

s

M

0

9

0

9

gained through the numerous mission critical projects that have been successfully delivered to our eminent BFS clients across the Globe. This is strengthened by the BFS Industry Practice, which has professionals who have served across Global, Regional and National Financial Institutions in various lines of BFS business and operations. Our global focus, deep industry knowledge and commitment to understanding and satisfying client needs have been critical to our successes.

The BFS Industry Practice is organised to deliver value to our clients across the multiple BFS-Industry Solution units, Insurance ISU, New Growth Markets and Emerging Market IOUs.

Figure

Figure 1: Complexity of Test Data Requests

References

Related documents

Under core plus, an individual acquisition workforce member must attain the existing certification standards applicable to their respective functional career field.. This

Ved ikke Batteri enkeltsvar - Randomiser 1-9 Sp.10 Når du køber hver af følgende varer til din husstand, hvor ofte køber du da varen som økologisk1.

The Evaluator will place a check mark in the box for each numbered item completed correctly. The Firefighter will get three attempts to PASS

Our study contributes to the understanding of intra-group divisions, by looking at political parties of ethnic minorities, as the main actors of electoral competition

The Ethics Committee has established the Code of Ethics and the Standards of Conduct to reinforce BICSI’s position on the importance of its members and BICSI credential

By 2018, 61% of jobs in Wisconsin will require Post Secondary Education (National Average in 2018 63%). Why is a Flex

The Dell PowerEdge Expandable RAID Controller (PERC) line of storage controllers has evolved to address these data stor- age requirements, providing innovative features to

When the remote GPU virtualization mechanism is used in a cluster, GPUs can be concurrently shared among several applications as far as there are enough memory resources available