IM02
How to manage your Test Data on
zEnterprise
18 - 20 September, 2012
IBM Forum Brussels
Notices
This information was developed for products and services offered in the U.S.A.
Note to U.S. Government Users Restricted Rights — Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.
IBM may not offer the products, services, or features discussed in this document in other countries. Consult your local IBM representative for information on the products and services currently available in your area. Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM product, program, or service may be used. Any functionally equivalent product, program, or service that does not infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to evaluate and verify the operation of any non-IBM product, program, or service.
IBM may have patents or pending patent applications covering subject matter described in this document. The furnishing of this document does not give you any license to these patents. You can send license inquiries, in writing, to: IBM Director of Licensing, IBM Corporation, North Castle Drive Armonk, NY 10504-1785 U.S.A.
The following paragraph does not apply to the United Kingdom or any other country where such provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES
CORPORATION PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or implied warranties in certain transactions, therefore, this statement may not apply to you.
This information could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of the publication. IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time without notice.
Any references in this information to non-IBM Web sites are provided for convenience only and do not in any manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of the materials for this IBM product and use of those Web sites is at your own risk.
IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring any obligation to you.
Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products.
This information contains examples of data and reports used in daily business operations. To illustrate them as completely as possible, the examples include the names of individuals, companies, brands, and products. All of these names are fictitious and any similarity to the names and addresses used by an actual business enterprise is entirely coincidental.
COPYRIGHT LICENSE:
This information contains sample application programs in source language, which illustrates programming techniques on various operating platforms. You may copy, modify, and distribute these sample programs in any form without payment to IBM, for the purposes of developing, using, marketing or distributing application programs conforming to the application programming interface for the operating platform for which the sample programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore, cannot guarantee or imply reliability, serviceability, or function of these programs. You may copy, modify, and distribute these sample programs in any form without payment to IBM for the purposes of developing, using, marketing, or distributing application programs conforming to IBM's application programming interfaces.
Trademarks
This presentation contains trade-marked IBM products and technologies. Refer to the
following Web site:
Agenda
Data Governance - Focus on Test Data creation and masking
Obtaining test data. Some common practices.
Obtaining test data. Best practices.
InfoSphere Optim Test Data Management / Data Privacy
A Smarter Planet harnesses today’s information
explosion for business benefit…
Instrumented
Interconnected
Intelligent
…creating a need for better
Information Governance
Streamlining processes to
manage business growth with
consistency
Ensuring compliance with
policies, laws and regulations
Controlling costs and
Information
Governance
Govern
Quality Security & Privacy Lifecycle Standards Transactional & Collaborative Applications Business Analytics Applications External Information Sources
Success requires governance across the “Information
Supply Chain”
Analyze
Integrate
Manage
Cubes Big Data Master Data Content Data Streaming Information Data Warehouses Content AnalyticsRequirements for managing data across its lifecycle
Capture & replay production workloads
Define policies Report & retrieve archived data
Enable compliance with retention &
e-discovery Move only the needed information Integrate into single
data source
Create & refresh test
data Manage data growth
Classify & define data and relationships
Develop & test database structures/ code Enhance performance Discover where data resides
Develop &
Develop &
Test
Test
Discover &
Discover &
Define
Define
Optimize &
Optimize &
Archive
Archive
Consolidate &
Consolidate &
Retire
Retire
Information Governance Core Disciplines
Lifecycle Management
Increasing Risk
Mandatory to protect
data and comply with
regulations
Increasing Costs
Defects are caught
late in the cycle
Organizations continue to be challenged with
building quality applications
Time to Market
Lack of realistic test
data and inadequate
Increasing Risk
45,000+
Number of sensitive
records exposed to 3
rdparty during testing
c70%
companies use actual
customer data to test
applications
aTime to Market
37%
Satisfied with speed of
software development
f30-50%
Time testing teams
spend on setting up test
environments, instead of
testing
bIncreasing Costs
$300 billion
Annual costs of
software-related downtime.
d32%
Low success rate for
software projects
eOrganizations continue to be challenged with building
quality applications
a. The Ponemon Institute. The Insecurity of Test Data: The Unseen Crisis
b. NIST, Planning Report. The Economic Impacts of Inadequate Infrastructure for Software Testing
c. Federal Aviation Administration: Exposes unprotected test data to a third party http://fcw.com/articles/2009/02/10/faa-data-breach.aspx d. The Standish Group, Comparative Economic Normalization Technology Study, CHAOS Chronicles v12.3.9, June 30, 2008
e. The Standish Group, Chaos Report, April 2009
Vulnerable non-production environments at risk
Most ignore security in non-production environments
70%
of organizations surveyed use live
customer data in non-production
environments (testing, Q/A, development)
Database Trends and Applications. Ensuring Protection for Sensitive Test Data
50%
of organizations surveyed have no way
of knowing if data used in test was
compromised
The Ponemon Institute. The Insecurity of Test Data: The Unseen Crisis
52%
of surveyed organizations
outsource development
The Ponemon Institute. The Insecurity of Test Data: The Unseen Crisis
$194
per record
cost of a data breach
Align application performance to business processes Ensure business continuity
Respond quickly and accurately to audit and discovery requests
Leverage existing investments in applications, databases and storage Reduce resource requirements for key IT operations
Profit from superior application performance and availability
Provision resources to meet priority business needs Automate data retention to support compliance initiatives Eliminate budget variances
Streamline application and database upgrades Speed disaster recovery
Simplify database administration
Reclaim underutilized capacity
Protect data privacy, integrity, and security
CIO
Business
IT
Impact Across the Enterprise – The benefits of TDM
strategy
Agenda
Data Governance - Focus on Test Data creation and masking
Obtaining test data. Some common practices.
Obtaining test data. Best practices.
InfoSphere Optim Test Data Management / Data Privacy
Test data creation is often accomplished through
cloning
Positives
Negatives
• Simple to do
• Requires little knowledge of the data model or infrastructure
• Creates an exact duplication of production
• Uses significant storage
- Much more than team needs
- Often done once and not for each team member
• Data is production ready and therefore a privacy risk
• Takes significant amounts of time to create • No way to compare to original after test is
complete
• Cannot span multiple data sources/applications • Developer/Tester downtime when sharing
data accessibility
Clone
500 GB
500 GB
Actual Data Burden = Size of production database + all replicated clones
The Data Multiplier Effect
500 GB
500 GB
500 GB
500 GB
500 GB
500 GB
Development
Test
User
Acceptance
Production
500 GB
500 GB
Backup
500 GB
500 GB
Disaster
Recovery
3000 GB
3000 GB
Total
Generating syntheticTest data
Positives
Negatives
• Safe • Resource-intensive:
- Huge commitment from DBA
- Deep knowledge of database schema
• Tedious: DBA’s must intentionally include errors to ensure robust testing process.
• Created data does not always reflect the integrity of the original data.
• Time-consuming: process is slower and can be error-prone.
Test data creation by writing SQL
Positives
Negatives
• ? • Write and maintain SQL.
• Complex and subject to change. • Referential Integrity? Right data? • Expensive, dedicated staff.
• Cannot span multiple data sources/applications. • Developer/Tester downtime when sharing
data accessibility.
SQL
Agenda
Data Governance - Focus on Test Data creation and masking
Obtaining test data. Some common practices.
Obtaining test data. Best practices.
InfoSphere Optim Test Data Management / Data Privacy
Test Data Management – Concepts
• Test Data Management (TDM) refers to the need to manage data used in various
pre-production environments and is a vital part of Application Quality & Delivery.
• Extract production data into referentially intact data subsets to be used to support
application data in other environments.
• De-identify (mask) extracted production data to protect privacy.
• Compare “before” and “after” images of test data.
Test Data Management Best Practices
Inspect/Browse & Seed Test Cases
Inspect/Browse & Seed Test Cases
Run Test
Production
Production
Correct Production Errors
50 GB
50 GB
50 GB
2 TB
2 TB
3 TB
Unit Integration Performance Function UAT4 TB
ProductionCompare Before and After Results Compare Before and
After Results Source 1 Source 2 Compare Process Subset Privatize Subset Privatize
Refresh Test Data Refresh Test Data
Gold Copy
Extract
& Subset
Extract
& Subset
Convert
& Mask
Convert
& Mask
Compare
& Audit
Compare
& Audit
Subset
Criteria
Subset
Criteria
Subset and Privatized Test DataLoad &
Distribute
Load &
Distribute
Masking
Rules
Masking
Rules
& Target
Source
Source
& Target
DB List
& Auth
DB List
& Auth
TDM Processes
Secured Lock-down Environment
Refresh test data
Developer Developer Developer Developer Tester Tester Tester Tester Tester TesterTraining Database
75 GB
Dev Database
25 GB
Refresh Refresh RefreshTest Database
50 GB
Test Environments
Example process
Speed delivery, reduce costs and improve quality while
reducing risk and increasing compliance with Test Data
Management
Without Test
Data Management
With Test
Data Management
Sits in queue for days and take several days to create
Submits request for test data
Sits in queue for days
Create test data
Tester
DBA
Takes several days to create
DBA Create or Refresh test data
Tester -Use test data in testing
-Request data refresh
Takes hours to create
Tester -Use test data in testing Refresh test data
-Submits request for test data
Sits in queue for days
Create test data
Tester
Agile development relies on agile testing, agile testing
relies on continuous access to test data
Organization
Process
Technology
an agile test organization
and testers need
continuous access to
test data
with agile development
you have continuous
integration and delivery
and have to test often
test data management
software needs to support
agile method by
streamlining access to
test data and having
insight into test data
Agenda
Data Governance - Focus on Test Data creation and masking
Obtaining test data. Some common practices.
Obtaining test data. Best practices.
InfoSphere Optim Test Data Management / Data Privacy
Q & A
Data
Governance
Manage Data
Lifecycle
•Data Retention •Data RetirementSecure
•Prevent Access •Restrict Access • Monitor AccessProtect &
Privacy
•Mask Data •Encrypt DataAudit
•Audit Privileges •Audit Users •Audit AccessDB2/RACF
Security
Tivoli zSecure
Audit
Data Encryp.
for IMS / DB2
Optim TDM
and DP
Optim
Data
Growth
Solution
Reduce risk from Security breaches
Comply with
regulatory compliance requirements
Archive inactive data and reduce amount of data exposed and requiring protections.
IBM is only solution provider with an end to end comprehensive solution
Protect sensitive customer data and employee data
Enterprise Data Governance for System z
Guardium
for z
IBM InfoSphere Optim supports the heterogeneous enterprise
Single, scalable, heterogeneous information lifecycle management solution provides a central point to
deploy policies to extract, archive, subset, and protect application data records from creation to deletion
Capture & Replay
Manage Test Data Archive
Discover
Partner
ISPF
Workstation
Repository Services
Data Access Services
Archiving Services
Subsetting Services
Data Privacy Services
Open Data Management
Security
Server
Storage Independent Archive
Storage Independent Archive
Index Index Data Data Metadata Artifacts
Storage Independent ArchiveExtract & Archive Files Index Index Data Data Metadata Artifacts Data Data Metadata Artifacts ODBC/JDBC
DB2
Optim
Directory
IMS
IMS Native Access
Orders Products
DB2
DB2 Access
Employee Payroll
VSAM / SEQ Files
Native Access
Customers Payments
An ISPF workbench software running under
Z/OS utilized to design, test and deploy
projects to the OPTIM™ Server. The ISPF
workbench software enables either Online and
Batch (JCL) execution.
ISPF
Workstation
Z/OS
Repository Services
Data Access Services
Archiving Services
Subsetting Services
Data Privacy Services
Open Data Management
Security
Server
Storage Independent Archive
Storage Independent Archive
Index Index Data Data Metadata Artifacts
Storage Independent ArchiveExtract & Archive Files Index Index Data Data Metadata Artifacts Data Data Metadata Artifacts ODBC/JDBC
DB2
Optim
Directory
Repository Services
Data Access Services
Archiving Services
Subsetting Services
Data Privacy Services
Open Data Management
Security
Store and retrieve metadata information, project information, archive catalog in the Optim
Directory
IMS
IMS Native Access
Orders Products
DB2
DB2 Access
Employee Payroll
VSAM / SEQ Files
Native Access
Customers Payments
OPTIM Server / Repository
Utilizes IMS provided drivers to access
data. Metadata is captured via
copybook imports (COBOL or PL/1)
Utilizes VSAM Native Accesss to access data. Metadata
is captured via copybook imports (COBOL or PL/1)
Utilizes SQL to access data. Provides
process to capture metadata from DB2
catalog.
Access source or destination Databases via data access with specific drivers per file type Extract and restore relationally intact Business Objects across multiple DB2 Databases, IMS and VSAM (e.g. Orders from DB2, Customers from VSAM and Credit Cards from IMS)
Store and retrieve, restore, delete, compress data, metadata and artifacts (e.g. external documents as BLOBs) related to Business Objects
Consistently and predictably mask and propagate data for the purpose of test data management with data compliance
Enable relational access to Archived Data via ODBC/JDBC and SQL-92. Can be used in conjunction to remote Database services of ODM to integrate with Enterprise Data Access for Business Intelligence
Provide functional and object security to separate product and data access by role and responsibilities using RACF
Optim Captures the Complete
Business Object
DBA view Referentially-intact subset of data DBA view Referentially-intact subset of data Application view Application-level business rules for data relationshipsApplication view
Application-level business rules for data relationships IMS IMS DB2 DB2 IMS IMS VSAM VSAM Federated access to data and metadata Federated access to data and metadata Related LUW Files or Documents Related LUW Files or Documents
Business Object : Represents application data record – payment, invoice, customer
• Referentially-intactsubset of data across related tables and applications; includes metadata, DDL, Reference + Transaction.
Benefit:Referential Integrity: Ensure data is captured and masked consistently
Complete Data
RI Preserved!
OS Independent
DB independent
ODBC Accessible
IBM InfoSphere Optim Test Data Management
Solution
Requirements
Benefits
• Deploy new functionality more quickly and with improved quality
• Easily refresh & maintain test environments
• Protect sensitive
information from misuse & fraud with data masking • Accelerate delivery of test
data through refresh
• Create referentially intact, “right-sized” test databases • Automate test result
comparisons to identify hidden errors
• Protect confidential data used in test, training & development
• Shorten iterative testing cycles and accelerate time to market
Create “right-size”
production-like environments for application testing
Test Data Management 100 GB 100 GB 25 GB 50 GB 50 GB 25 GB 2TB 2TB Development Unit Test Training Integration Test -Subset -Mask Production or Production Clone
InfoSphere Optim TDM supports data on distributed platforms (LUW) and z/OS.
Out-of-the-box subset support for packaged applications ERP/CRM solutions as well as :
Other
Other
-Compare -Refresh
InfoSphere Optim Test Data Management
Standard methodology
Production Test Application Extract / privatize production data Load test database Test Extract file Compare results Results? Success Unsuccessful Edit data Deploy ApplicationRefresh and Retest
Optim fits with your
testing methodology
– Extract
– Subset
– Privatize
– Load
– Edit
– Compare
Automate all or part of
the process
Enterprise Testing Solution with Rational and InfoSphere Optim
Building better quality applications
Initiate Data Extract Scripts Subset & Mask Production Data for Testing
Execute Automated Test Routines Compare Before
& After Data Go Production! Design & Manage
Test Campaign
Browse & Edit Test Data InfoSphere Optim InfoSphere Optim InfoSphere Optim InfoSphere Optim InfoSphere Optim InfoSphere Optim Fail Refresh Masked Test Data
InfoSphere Optim
InfoSphere Optim
Comprehensive software quality
process to minimize cost and shorten
development cycles
– Manage test labs
– Create realistic test environments
from production data
– Ensure protection of sensitive data
– Manage unit, functional and
performance testing and quality test
cases
Streamline your test data
management processes and deliver
your project sooner and with fewer
defects
The process : Access Definiton
DB2 DB2 defined relationship
OPT Relationship defined with Optim
Only records where State = “GA”
Extract Process
Extract from source tables
Extract data and/or object definitions
EXTRACT
Extract
File
Process Report -- --- -- --- -- -- -- -- - -- -- -- -- - -- -- -- -- - -- -- -- -- - -- --- -- --- -- --- -- --- -- --- ----CUSTOMERS ORDERS DETAILS PRODDBUse BROWSE to verify extracted data
Point &
Insert Process : Populate Destination Tables
Table Map
– Table names need not match
– Change qualifier and/or table name
Insert Process : Populate Destination Tables
Column Map
– Map unlike column names
– Transform/mask sensitive data
– Datatype conversions
– Column-level date aging
Literals Special Registers Expressions Default Values User exits
Edit / Browse : Traditional vs. Relational Tools
One table/view at a time No edit of related data
from multiple tables
FIND CUSTOMER NOTE INFO EXIT TABLE FIND ORDERS NOTE INFO EXIT TABLE FIND DETAILS NOTE INFO
EXIT TABLE
CUSTOMERS
ORDERS
DETAILS
... ... ... ... ...Single Table Editors
The Relational Editor
Simultaneous browse/edit
of related data from
Editing Data
Edit data to:
• Insert Rows
• Delete Rows
• Update Rows
Relationally Joined Data
Browse or edit related rows
Scroll of higher-level table automatically synchronizes
Commit/Restore
Commits are automatically made to the database when you move your
pointer to a different row
– Each instance of a commit counts as an undo level
Restore changes to a row, table or fetch set
Backing Out Changes - Row Level
Undo removes last change made to the
current row
Undo… brings up a row list and lets you
select how far back you want to restore
the current row
Undo All removes all changes made to the
Challenges of Enterprise Data Privacy
Multi-platforms
Relational database applications in the enterprise
– Complex data model
– Multiple databases
– Legacy data components
– Interconnected applications
Distributed work teams
– Employees and contractors
– Global 24 x 7 operations
What is data masking?
Definition
Method for creating a structurally similar but inauthentic version of an organization's data. The purpose is to protect the actual data while having a functional substitute for occasions when the real data is not required.
Requirement
Effective data masking requires data to be altered in a way that the actual values cannot be determinedor reengineered, functional appearance is maintained.
Other Terms Used
Obfuscation, scrambling, data de-identification
Commonly masked data types
Name, address, telephone, SSN/national identity number, credit card number
Methods
– Static Masking: Extracts rows from production databases, obfuscating data values
that ultimately get stored in the columns in the test databases
– Dynamic Masking: Masks specific data elements on the fly without touching applications or physical production data store
Statically mask data in non-production databases
Patient No
123456
SSN
333-22-4444
Name
Erica Schafer
Address
12 Murray Court
City
Austin
State
TX
Zip
78704
Patient No
112233
SSN
123-45-6789
Name
Amanda Winters
Address
40 Bayberry Drive
City
Elgin
State
IL
Zip
60123
Statically mask Mask data in non-production databases such as test and development
Improve security of non-production environments
Facilitate faster testing processes with accurate test data
Support referential integrity
Optim Data Privacy Solution
Production
Contextual,
Application- Aware,
Persistent Data
Masking
Contextual,
Application- Aware,
Persistent Data
Masking
VSAM
IMS
DB2
Test
VSAM
IMS
DB2
Substitute confidential information with fictionalized data
Deploy multiple masking algorithms
Provide consistency across environments and iterations
Enable off-shore testing
Masked fields
are consistent
Data is
masked
132009824 157342266 SSN#s 132009824 157342266 SS#sDB2
323457245 134235489 SSN#s 323457245 134235489 SSN#sClient Billing Application
Consistent mapping Across the enterprise
During Extract Process
Or
Standalone Convert Process
Or
During Insert/Load Process
Transform or Replace sensitive data using
Standard mapping rules:
Literals, Special Registers,
Expressions, Default Values,
Look-up tables
Complex mapping rules:
User exits
De-Identify test data
Production Data Extract and Convert Masked Test Data
Optim Data Privacy in Application Testing
Extract
File
-- --- -- --- -- --- ----Transform / mask sensitive data -- ---- ---- ---- --- ----CUST -- ---- ---- ---- --- ----ORD -- --- -- --- ----DETL TESTDB -- ---- ---- ---- --- ----CUST -- ---- ---- ---- --- ----ORD -- --- -- --- ----DETL QADBLOAD
INSERT/
UPDATE
Load FilesExtract a relationally intact subset from production database(s)
• Extract data and/or object definitions
• Define a new set of test tables
• Apply masking during population process
• Extract file may be reused but contains un-Masked data
• Good practice for testing masks
-- --- -- --- -- -- -- -- - -- -- -- -- - -- -- -- -- - -- -- -- -- - ----CUSTOMERS ORDERS DETAILS -- --- -- --- -- -- -- -- - -- -- -- -- - -- -- -- -- - -- -- -- -- - -- --- -- --- -- --- -- --- -- --- ----CUSTOMERS ORDERS DETAILS -- ---- ---- ---- --- ----CUST -- ---- ---- ---- --- ----ORD -- --- -- --- ----DETL NewDB
Create
Extract
File
-- --- -- --- -- --- --- --- --- --- --- ---CUST -- ---- ---- ---- --- ----ORD -- --- -- --- ----DETL TESTDB -- ---- ---- ---- --- ----CUST -- ---- ---- ---- --- ----ORD -- --- -- --- ----DETL QADBLOAD
INSERT/
UPDATE
Load Files Transform / mask sensitive data Extract a relationally intact subsetfrom production database(s)
• Extract data and/or object definitions in pre-masked file
• Use pre-masked Extract file to create new set of tables
• Convert Pre-masked extract file data into second masked extract file
• Share masked extract file to be reused for population step
• Good practice for testing masks using COMPARE
-- --- -- --- -- -- -- -- - -- -- -- -- - -- -- -- -- - -- -- -- -- - ----CUSTOMERS ORDERS DETAILS -- --- -- --- -- -- -- -- - -- -- -- -- - -- -- -- -- - -- -- -- -- - -- --- -- --- -- --- -- --- -- --- ----CUSTOMERS ORDERS DETAILS
Masked
Extract
File
-- ---- ---- ---- --- ----CUST -- ---- ---- ---- --- ----ORD -- --- -- --- ----DETL NewDBCreate
Optim Data Privacy in Application Testing
Extract
File
-- --- -- --- -- --- ----Transform / mask sensitive data -- ---- ---- ---- --- ----CUST -- ---- ---- ---- --- ----ORD -- --- -- --- ----DETL TESTDB -- ---- ---- ---- --- ----CUST -- ---- ---- ---- --- ----ORD -- --- -- --- ----DETL QADBLOAD
INSERT/
UPDATE
Load FilesExtract a relationally intact subset from production database(s)
• Most Secure Approach
• Extract data only
• Convert during extract
•Extract file already contains masked data
•Can be shared with testers to reuse
-- --- -- --- -- -- -- -- - -- -- -- -- - -- -- -- -- - -- -- -- -- - ----CUSTOMERS ORDERS DETAILS -- --- -- --- -- -- -- -- - -- -- -- -- - -- -- -- -- - -- -- -- -- - -- --- -- --- -- --- -- --- -- --- ----CUSTOMERS ORDERS DETAILS
Transformation Techniques
String literal values
Character substrings
Random or sequential numbers
Arithmetic expressions
Concatenated expressions
Date aging
Lookup values
Intelligence
IBM InfoSphere Optim Data Masking
• Protect sensitive
information from misuse and fraud
• Prevent data breaches and associated fines
• Achieve better information governance
• Protect confidential data used in test, training & development systems • Mask data on screen in
applications
• Implement proven data masking techniques • Support compliance with
privacy regulations
• Solution supports custom & packaged ERP
applications
Requirements
Benefits
De-identify sensitive information with realistic but fictional data
Personal identifiable information
is masked with realistic but
fictional data
JASON MICHAELS
Contextually accurate masked data facilitates
business processes
PersNbr FstNEvtOwn LstNEvtOwn
27645 Elliot Flynn 27645 Elliot Flynn
Event Table
Event Table
PersNbr FstNEvtOwn LstNEvtOwn
10002 Pablo Picasso 10002 Pablo Picasso
Event Table
Event Table
Personal Info Table
Personal Info Table
PersNbr FirstName LastName
08054 Alice Bennett 19101 Carl Davis
27645 Elliot Flynn
Personal Info Table
Personal Info Table
PersNbr FirstName LastName
10000 Jeanne Renoir 10001 Claude Monet 10002 Pablo Picasso
Referential integrity is maintained with key propagation
Patient Information
Patient Information
Patient Information
Patient No. SSN Name AddressCity State Zip
Patient No. SSN
Name Address
City State Zip
112233 123-45-6789 Amanda Winters 40 Bayberry Drive Elgin IL 60123 123456 333-22-4444 Erica Schafer 12 Murray Court Austin TX 78704
Data is masked with contextually correct data to preserve integrity of data
Maintain value of test data Reduce risk of data breaches
Satisfy Privacy regulations
• Generic mask • Dates
• User defined • Arithmetic expressions
• Lookup values
• Business data types (CCN, NID) • String literal values
• Character substrings & concatenation • Random or sequential numbers
Street Address/City/State/Zip Code Data Sets
6767 Rte 10 S 12 Buttercup Ln Street 08594 NJ Princeton 105,333 $8,777,733,811 44101 OH Cleveland 54,999 $534,674,233 Zip Code State City Customers Total Assets1) Client is a Bank who wishes to mask its assets by location
2 Applegarth Ln 12 Street Road 3526 Diamond Rd 12 Roden Dr 288 Helm St 04011 ME Brunswick 89101 NV Las Vegas 98101 WA Seattle 90001 CA Los Angeles 53201 WI Milwaukee 21 Street Rd 3526 Diamond Rd Street 89101 NV Las Vegas 105,333 $8,777,733,811 98101 WA Seattle 54,999 $534,674,233 Zip Code State City Customers Total Assets 2) Optim provides corresponding Street
Address/City/State/Zip Codes for masking
New Table with Masked Data
Address Lookup
Table
3) Leverage Multiple Column Replacement. Entire address row can be masked with a valid Coding Accuracy Support System
(CASS) address using enhanced random lookup function
© 2012 IBM Corporation
First Names and Last Names Data Sets
Stacey Dave Danielle Bob John
First Name Last Name GPA High School Advisor State
Paul Smith 3.2 Princeton
Johnson NJ
Kate Jones 2.7 Albany Kline NY
First Name Last Name GPA High School Advisor State
Dave Nelson 3.2 Princeton
Johnson NJ
Stacey Reese 2.7 Albany Kline
NY
1) Client is a University who wishes to mask
the first and last name fields in their
admissions database
2) Optim now has a first name lookup
table with over 5,000 male/female names
and a last name lookup table with over
80,000 names
Test Database Reese Howell Kline Nelson Newton First Name Lookup Table Production Database Last Name Lookup Table3) Use Lookup Tables to randomly
replace table first and last names
Data Privacy Transformation Library Functions
TRANS SSN
Generates valid and unique U.S. Social Security Number (SSN).
By default, algorithmically generates consistently altered destination SSN based on source SSN.
Can also generate a random SSN when the source data does not have an SSN value or when there is no need for transforming the source SSN in a consistent manner.
TRANS CCN
Use the TRANS CCN function to generate a valid and unique credit card number (CCN).
By default, randomizes entire string, can also randomize parts of the credit card (example- preserve cc type).
TRANS EML
Generates a random e-mail address.
An e-mail address consists of two parts, a user name followed by a domain name, separated by ‘@’. For example, user@domain.com.
JASON MICHAELS
Without Key Propagation…
Cust ID Item # Order Date
27645 80-2382 20 June 2004
27645 86-4538 10 October 2005
Customers Table
Orders Table
Cust ID Name Street
08054 Alice Bennett 2 Park Blvd 19101 Carl Davis 258 Main
27645 Elliot Flynn 96 Avenue
Original Data
Cust ID Item # Order Date
27645 80-2382 20 June 2004
27645 86-4538 10 October 2005
Customers Table
Orders Table
Cust ID Name Street
10000 Auguste Smith Mars23 10001 Claude Jones Venus24
10002 Pablo Adams Saturn25
Without Key Propagation
Now these are Orphans!
Masking with Key Propagation
Cust ID Item # Order Date
27645 80-2382 20 June 2004
27645 86-4538 10 October 2005
Customers Table
Orders Table
Cust ID Name Street
08054 Alice Bennett 2 Park Blvd 19101 Carl Davis 258 Main
27645 Elliot Flynn 96 Avenue
Original Data
Cust ID Item # Order Date
10002 80-2382 20 June 2004
10002 86-4538 10 October 2005
Customers Table
Orders Table
Cust ID Name Street
10000 Auguste Smith Mars23 10001 Claude Jones Venus24
10002 Pablo Adams Saturn25
De-Identified Data
Referential integrity is maintained
Using Custom Masking Exits
Apply complex
data transformation algorithms
and populate the
resulting value to the destination column
Selectively
include or exclude rows
and apply logic to the
masking process
Valuable where the desired transformation is beyond the scope of
supplied Column Map functions
Example:
Generate a value for CUST_ID based on customer location,
average account balance, and volume of transaction activity
Agenda
Data Governance - Focus on Test Data creation and masking
Obtaining test data. Some common practices.
Obtaining test data. Best practices.
InfoSphere Optim Test Data Management / Data Privacy
66
Which IBM Software is key for delivering a
completly masked Test data subset?
A. Combination of answer B & D
B. InfoSphere Optim Test Data Management
C. InfoSphere Guardium
D. InfoSphere Optim Data Privacy
Participate in the System z Expert and Superhero contest!
Fill in your answer to the question below on the scorecard and
deposit your card in the box!
More information on zEnterprise
IBM zEnterprise / System z Redbooks Portal:
http://www.redbooks.ibm.com/portals/systemz
IBM zEnterprise Announcement Landing Page:
ibm.com/systems/zenterprise196
IBM zEnterprise HW Landing Page:
ibm.com/systems/zenterprise196
IBM zEnterprise Events Landing Page:
ibm.com/systems/breakthrough
IBM Software:
ibm.com/software/os/systemz/announcements
IBM System Storage:
ibm.com/systems/storage/product/z.html
IBM Global Financing:
ibm.com/financing/us/lifecycle/acquire/zenterprise/
Global Technology Services:
Thank You
Merci
Bedankt
Gracias!
Obrigado
Danke
Japanese English French Russian German Nederlands Spanish Brazilian Portuguese Arabic Traditional Chinese Simplified Chinese Thai