Data Quality Management
Data Quality Management
The Most Critical Initiative You Can Implement
The Most Critical Initiative You Can Implement
SUGI 29
SUGI 29 –– MontrealMontreal May 2004
May 2004
Claudia Imhoff President
Jonathan G. Geiger Executive Vice President
Topics
What is Data Quality Management?
Data Quality Management Challenges
Data Quality Definition
Four Pillars of Data Quality Management
Data is an Asset
Other corporate assets include
People
Capital (Money) Property
Materials
Assigning value is difficult
Establishing ROI for Data Quality
What is Data Quality
Management?
Establishment and deployment of:
Roles,
Responsibilities, Policies and
Procedures
Concerning the acquisition, maintenance,
dissemination and disposition of data
Viability of business decisions – contingent on good data...
Good data – contingent on an effective approach to Data Quality Management
Data Quality Management
Responsibilities
Business Responsibilities
Business rules governing data Data quality verification
Information Technology Responsibilities
Manage environment for acquiring, maintaining, disseminating, and disposing of electronic data
Architecture
Program Manager and Project Leader
Organization Change Agent
Business Analyst and Data Analyst
Data Steward
Data Quality Management
Components
Reactive: addresses problems that already exist
Deal with inherent data problems, integration issues, merger and acquisition challenges
Proactive: diminishes the potential for new problems to arise
Governance, roles and responsibilities, quality expectations, supporting business practices, specialized tools.
Data Quality Management
Importance
Companies often realize the importance too late
Only after several documented problems with the data do they recognize the need to improve its quality.
Billions of dollars are lost annually due to data quality problems.
Additional estimates have shown that 15-20% of the data in a typical organization is erroneous or otherwise unusable.
The importance of Data Quality Management should be evident – so why aren’t companies addressing it more aggressively?
Topics
What is Data Quality Management?
Data Quality Management Challenges
Data Quality Definition
Data Quality Management
Challenges: Responsibility
No single business unit is responsible for
enterprise data
Once captured in operational system,
business unit washes hands of further responsibility
Savvy corporations adopt data
stewardship approach
Data Quality Management
Challenges: Cross Functionality
Horizontal alignment in a vertical world
Data Quality Management crosses
organizational boundaries
Data Quality Management
Challenges: Problem Recognition
Corporation must recognize that it HAS a
Data Quality Management problem
Is your company in denial?
Getting money for a unrecognized problem is
Data Quality Management
Challenges: Discipline
Downstream impacts must be understood
and considered in decisions
Corporation must define and assign
responsibilities
In job descriptions
Data Quality Management
Challenges: Investment
Time
Funding
Resources
All needed to overcome “unquality”
Examples
Duplicate materials to the same customer or prospect
Data Quality Management
Challenges: On-Going Effort
This is not a one-time effort
Data Quality Management Staffing is required
Should reduce staffing requirements elsewhere
Governance is the name of the game
Data Quality Management
Challenges: Return on Investment
What is the cost of “unquality”?
Work-arounds absorbed into daily processes
Topics
What is Data Quality Management?
Data Quality Management Challenges
Data Quality Definition
Quality - Definition
Quality is conformance to requirements
Whose requirements?
How are requirements set?
Defect Rate Target
Quality - Definition
Quality is not
Quality - Definition
To the user, the data warehouse is the source
Data model provides basis for data collection
Definitions
Validation rules
Relationship rules
Actual data must also be examined
Operational business process implications
Abuse of defined fields
Undocumented business rules
$
100% C O M P L E T E N E S Complete but with errors Very Dangerous May be aproto-Perfect data Expensive
Incomplete but accurate
Four Types of Error
Correction
Reject the error
Accept the error
Correct the error
Reject the Error!
Better to have missing data than inaccurate data
Reject the complete record
Correct at the source and re-extract the data
Accept the Error!
Data error is within tolerance limits
Correct data at the source
If not correctable,
provide meta data on the error
Correct the Error!
Data essential for completeness
Correction is required
Use temporary file
Use Default Value for Data
in Error!
Data needed for completeness
Data is unusable as is
Data value is replaced with a default value
Meta Data must be used to explain when and how the default is used
Topics
What is Data Quality Management?
Data Quality Management Challenges
Data Quality Definition
Four Pillars of Data Quality
Management
Four Pillars of Data Quality
Management
Data Profiling – Gaining an understanding of existing data relative to quality specifications
This is your starting point from which improvement (and ROI) is measured
Is the data complete? Is the data accurate?
Data Quality – Gaining an understanding of the causes of quality problems
Four Pillars of Data Quality
Management
Data Integration – Collapsing disparate versions of data into a single one
Recognition that same data exists in multiple locations with variable content
Standardize the multiple versions (e.g., customers, products, geographies, etc.) to single version
Data Augmentation – incorporation of additional external data to gain insight
Combine internal customer data with third party data to increase understanding of the customer
External data – competitor, customer demographic or credit history, total industry sales data
Topics
What is Data Quality Management?
Data Quality Management Challenges
Data Quality Definition
Getting Started
Education
Stewardship Program
Partnerships & Environment
Four-Phase Program
Education
Involve key data warehouse effort participants
Business users Developers
Influencing people
Better chance of getting commitment
Involves various techniques
Facilitated sessions Interviews
Stewardship - Definition
Webster’s Dictionary: A steward is one who
is called upon to exercise responsible care over possessions entrusted to him/her
The steward does not own the possessions
The steward has a responsibility affecting the
processes that impact the possessions
The steward may be a business unit or
Data acquisition •Processes •System roles •Update authority •Validation rules •Business rules •Quality Dissemination •Access security
•Standard queries and reports
•Capabilities •System use •Quality
•Meta data provided
Disposal •Retention •Erasure Data management •Data models •Demographics •Naming standards
•Meta data requirements
We need to approach this in an
organized manner
Data Steward
Responsibilities
Partnerships & Environment
Business Unit InformationTechnology Business Unit Business Unit Executive
Management MiddleManagement Information
Partnerships & Environment
Address quality issues explicitly
Address known quality problems
Business processes
Operational data
Ensure environment supports quality
Properly train and equip team
Partnerships & Environment
Quality expectations must be:
Understood Negotiated
Communicated Met
Quality is a business issue -- NOT just a technical issue
Quality is not an issue for one business unit --horizontal activity
Quality Committee Data Stewardship
Technology Support
Data Quality Management companies like DataFlux are available to help you get started.
They can:
Help you determine your Data Quality Management needs
Develop a plan to help meet your needs
Provide the technology, methodology and services to execute your plan
Summary
Data Quality Management is not a luxury – it
is essential
The first step is to recognize that you have data “unquality”
A sound program consists of four pillars
Getting started requires commitment and