Data Masking Checklist

(1)

Data Masking Checklist

Selecting the Right Data Masking Tool

(2)

Selecting Your Masking Tool

Ensuring compliance with current data protection regulations and guidelines has become a mandatory operation. Non-compliance not only carries the risk of heavy fines and damages public relations, but also fails to adequately secure your sensitive data against data breaches. Traditionally, many organizations have used manual techniques to mask (see also de-sensitize, de-identify, or obfuscate) full copies of production data for use in development and testing. However, this is a labour-intensive, time-consuming and costly process that is prone to human error and inconsistency. As a result, teams are often provided with poor quality data that is both inefficient and expensive to create. This lengthens your test cycles as testers wait for data and reduces quality, resulting in more potentially costly defects making production. Therefore, organizations are increasingly beginning to look towards implementing data masking tools to improve the quality of the data and reduce the length and cost of their test cycles. However, there are a number of data masking tools on the market, so how do you choose the right one for your project?

Below, we have set out a matrix containing a comprehensive list of all the features you need to consider when ensuring that your testing and development teams are provided with high quality, compliant test data that can increase the quality and reduce the cost of your project. In each case, we’ve noted how important these are, and how they can help solve the probable problems you might face in the real world.

(3)

Masking Features Weighting In The Real World Y/N

Application and Database Integrity

Mandatory Consistent masking across multiple applications is essential for integration and end to end testing.

Cross-Platform Integrity Mandatory Most large enterprises feed data across multiple platforms and technology stacks.

Consistent masking across multiple platforms is essential for integration and end to end testing.

Cross-System Data

Relationship Discovery and Definition

Medium This is usually part of the set-up and understanding of applications. In our experience, this can quickly be derived from pattern matching inside the data, naming standards inside the catalogue and documentation.

These relationships are interesting, however, more focused ‘PII and Financial Discovery Scanning’ (see below) is far more important as it takes far more time, is more prone to error and is subject to changes over time.

PII and Financial Discovery Scanning

Mandatory The ability to scan all or percentages of the data across multiple systems and automatically identify which data is potentially problematic is essential. Relying on users’ interpretation of reports and screens is not good enough to discover where hidden data exists in the system. The alternative is to use a ‘double blind’ manual approach, where multiple users arrive at the same conclusion about which data needs to be masked. However, this process is extremely time-consuming and results in a project failure.

Vendor-Provided Apps Packs of Rules

Low Relying on a pre-defined set of rules provided by a vendor means that you are relying on their knowledge of a specific ERP. This is fraught with danger; remember, it is you who is liable if there is a data leakage!

(4)

Also, these app packs don’t consider local customizations within your applications – the way you use the system as well as the normal usage of flex fields etc. It is better to use a robust PII scanning tool to guarantee nothing is missed when masking.

Masking Features Weighting In The Real World Y/N

Integration into BAU Development Structures

High The ability to easily fit the processes into existing DBA data provisioning procedures in a timely manner.

Masking Repeatability High The ability to consistently mask data using either deterministic masking functions or

cross-reference tables means the data can be masked in a similar manner across applications

Multiple Database and Platform Support

High Support for masking on single platform or single database types will result in different, inconsistent masking being set up across the enterprise.

Being able to mask data in legacy systems, such as IMS and VSAM as well as SQL Server, for example, is essential.

Multiple Masking Technology Stacks

High One size definitely does not fit all. Some vendors provide a single method of masking, for example, in-place masking, extract into files-mask-and return etc. In reality, to mask very large or complex applications across multiple platforms means that different technologies need to be used. This could include native database utilities, in-database functions, or native mainframe masking etc.

Reporting and Auditing High Reporting on what has been masked is required, however, a more important

consideration is who chose what needs to be masked, why it does and when. In addition, there needs to be an audit of exactly what technology was used to perform the masking.

Flexible Masking Engines and Methodologies

High The masking product needs to provide multiple methods for the data team. Based on the size, urgency and potential risk, having simple-to-complex technology available means that teams will be much more reactive.

(5)

Technology should include: In-place masking, extract and mask ‘in flight’, build shadow tables, as well as dynamic masking via views and message layers.

Dynamic Masking High In some cases, ad hoc queries need to be made against real data. Access to this real

data can be controlled by creating a masked transparency layer. This uses a set of views which mask certain fields consistently across databases. These views can also be adjusted to identify which users have access to which data. In addition,

development applications can be set up to use the masked transparency layer so that data used by developers appears masked.

Dynamic masking can also be deployed at the message or SOAP level. This can be extremely useful for TDM teams as they can quickly provide access to web services via a proxy. The proxy masks the data ‘in-flight’.

No SQL Masking engine High Some dynamic masking engines try and interpret the SQL and mask the data

returned from and to the database. All RDBMS’ support the concept of views and synonyms, so using the native RDBMS’ own built-in functions is a much more sensible and standard approach.

Subsetting in Conjunction with Masking

High A lot of current data legislation refers to ‘minimal data’ being used. Adding subsetting to a masking project should be easy and is highly recommended. It can also quickly improve the run times of data provisioning and agility of teams.

Complex Flat File Structures Medium Being able to verify that flat file structures are valid (see Data Quality) as well as fully

understood is key. Many enterprise systems will contain multiple definitions of files and messages; being able to verify these and mask effectively is essential.

Being able to Mask Isolates High Dependent on the level of masking required, being able to mask isolated values, for

(6)

do not want one piece of information being able to be used to trace back to a specific user or account.

Being able to Mask Trends Medium If an entire masked database is lost, then the general trends of the data have

commercial value. Being able to mask these trends, so that application integrity is still maintained, is essential for fully secure masking.

Subsetting can help with this issue, as can using data constellations to provide the essence of all the data without data trends.

Data Constellations High For very highly regulated markets, shipping masked data offshore is very

problematic. The inability to send data offshore can result in increasing costs. Using a data constellation that looks for data dimensions that exist in production (basically transaction major attributes), linked with synthetic and/or masked PII data, allows ‘production-like’ data to be provisioned with none of the real content.

Richness of Functions, as well as Custom Masking Routines

High Most masking tools allow addresses and names to be masked. However, more complex types of masks, such as IBAN numbers, check sums etc. need to be included.

In addition to this, the ability to build local custom masking routines or integrate existing masking should to be included.

Advanced Masking Functionality

High As a project develops, more complex types of masking requirements are often discovered. The masking tool must be able to handle these complex needs. A typical example would be multi value – multi column cross referencing. For instance, the names Adam Smith, A Smith and ASMITH need to be masked

consistently. Many vendors do this by simply hand-building SQL to be run prior to the mask.

(7)

Integration with a Test Data on Demand Strategy and Platform

Medium Masking can be time-consuming and tedious. Being able to use this work to provide a better approach to test data delivery will improve the quality of development and reduce the number of bugs that make it into production.

Performance Mandatory The masking technology stack needs to be able to mask medium to large databases

very rapidly. Being able to fit a run into a nightly or on-demand window is essential; developers and testers cannot wait days for production refreshes.

In some cases, we have seen our technology run 100 times faster than competitors’ technology. This is particularly important in databases with multiple billions of rows.

Mainframe Support High Local support on the mainframe is essential. Using ETL processes or ODBC layers will

not perform and does not fit in with normal mainframe batch estates.

Reversible Masking Low Being able to work your back from a masked value can be useful, and thus can be set

up in a number of different ways.

Agile Development Support High Masking tools that do not allow data to be delivered to Agile teams at the beginning,

middle and end of a sprint with multiple database and meta model changes should not be considered.

No ETL required Mandatory Some products require that all data is transformed into another database, flat file or

platform before it can be masked. This causes very long delays and introduces a high level of complexity that is not required for masking projects. In addition, a high level of CPU usage is needed to move data back and forward.

Data Quality Management High When masking data the quality of data must be considered. If production data

contains ‘bad’ data then consideration should be given to retaining that ‘bad’ data in development.

Masking tools should be able to identify these outliers and then be configured to pass on the data. For ETL projects, this is a must as the migration code must be able to check for non-standard data.

Consulting Services Med to Low Products should be stand-alone and usable after training. Having consulting services

(8)

Data Masking Checklist