© 2012 IBM Corporation
Information Governance and Data Quality
Stefano Mino – EU Sales Leader InfoSphere Information Integration
© 2012 IBM Corporation 2
Ensuring
Compliance
Increasing
Complexity
Declining
Quality
Protecting
Privacy
1 trillion
Connected
devices in the
world
$8.2 million
Annual loss
by average
organization due
to poor data
quality
$204
Cost per
compromised
record
$29.8 billion
U.S spending
on governance,
risk and
compliance
Most organizations continue to struggle about Data
“
CIOs must determine whether your Information Governance strategy adequately“
reflects the relationship to your overall information management initiative.If the relationship is unclear, or the stated goals are different, work with the business to refactor your strategy.
Organizations continue to have Data Quality Challenges
Compliance and transparency pressures increasingly highlight data
quality issues
– No method to maintain high quality data
Unreliable Insights
– Low data quality leads to lack of trust and results in poor business decisions – Inability to identify source of quality issues
– Unreliable insights are persisted to other strategic initiatives, which then base key business decisions on bad data
High costs & negative customer satisfaction
– Organizations are recognizing that there can be both direct (missed revenue opportunity) and indirect (low customer satisfaction and high churn) financial costs from poor data quality.
Survey: Data quality software is viewed as critical technology . . .
79% of survey respondents indicated they had deployed their
tools of choice in more than one project or deployment, as
compared to only 58% in 2008.
2009 Survey on Data Quality Tools Highlights Broadening Deployments With Focus on Proven Functionality, Gartner, 14 August 2009/ID Number: G00170331
The Cost of Dirty Data
83% of Data Integration projects either overrun or fail
Inaccurate or incomplete data is a leading cause of failure in
business-intelligence and CRM projects
25% of time is spent resolving bad data
Undetected defects will cost 10 to 100 times as much to fix
upstream
Low data quality costs companies $611 billion annually
Scrap and rework Increased costs
© 2012 IBM Corporation
6
IBM Information Governance creates order out of information chaos
Orchestrate people, process and technology toward a common goal
– Promotes collaboration
– Derive maximum value from
information
Information Governance is the exercise of decision rights to
Information Governance is the exercise of decision rights to
optimize, secure and leverage data as an enterprise asset.
optimize, secure and leverage data as an enterprise asset.
Governing the creation, management and usage of
Governing the creation, management and usage of
enterprise data is not an option any longer. It is:
enterprise data is not an option any longer. It is:
Expected by your customers Demanded by your executives Enforced by
regulators/auditors
Leverage information as an enterprise asset to drive opportunities
– Safeguarding information – Ensure highest quality
© 2012 IBM Corporation
7
Information Governance
Govern
Quality Security & Privacy Lifecycle Standards Transactional & Collaborative Applications Business Analytics Applications External Information Sources
Success requires governance across the
“Information Supply Chain”
Analyze
Integrate
Manage
Cubes Big Data Master Data Content Data Streaming Information Data Warehouses Content Analytics© 2012 IBM Corporation
8
What is Information Governance?
IBM Defines Information Governance as
a
holistic approach
to managing and leveraging information
for business benefits.
It encompasses information
quality
, information
protection
and
information
life cycle
management
© 2012 IBM Corporation
9
Proactively leveraging information . . .
to unlock value and manage risk
Ensure information is
understood
and consistently
defined
.
Increase the use and
trust
of information as an enterprise asset.
Protect
information,
reduce risk
and
comply.
Executive-Level Data Governance Bodies Line of Business Stewardship Community Data Quality Reporting Team Project Teams Virtual Teams Executive Sponsorship Risk Data Council
Data Governance Program Manager Technical Liaisons (4) Business Liaisons (4) Metadata Liaison (1)
Data Governance PMA
Risk Data Governance Office (DGO) Data Quality Reporting Liaison (1) Data Definition Stewardship Function Data Production Stewardship Function Data Usage Stewardship Function Quality Measurement Stewardship Function Lead Steward Executive-Level Data Governance Bodies Line of Business Stewardship Community Data Quality Reporting Team Project Teams Virtual Teams Executive Sponsorship Risk Data Council
Data Governance Program Manager Technical Liaisons (4) Business Liaisons (4) Metadata Liaison (1)
Data Governance PMA
Risk Data Governance Office (DGO) Data Quality Reporting Liaison (1) Data Definition Stewardship Function Data Production Stewardship Function Data Usage Stewardship Function Quality Measurement Stewardship Function Lead Steward 3. Identif y Domain SMEs and Stakeholders 8. Mentor Stewards 6. Recognize Data Definer, User, andProducer Stewards 2.1.1 Build Team D ata Q ua lit y Team -Busi ne ss & Data A na ly sts Data D om ai n Ste wa rd L OB/F un ctio nal A re a Data S te war d Co ord inat or Data S te ward Co mm itt ee D ata G ove rn ance C ou nci l 4. Identify SMEs for Applications Resource C hecklist Template Process M anual 5. Note Potential Data Stewards During Domain Definition R esource ChecklistTemplate Steady State: DG Council and
Data Steward Commit tee* are Established 9. Mobilize Data Stewards 2.1.2.7 End 1. Select Data Domain Steward R esource Checklist Template 2. Map Data Domain to Lines ofBusiness * If Data StewardCommittee is not yet Established, LOB Coordinators (who will eventually be on it) will servethis funct ion
7. On-Board Stewards Resource C hecklist Template
2.1.2 Build Common Definition (Continued)
LO B/ Fun ctio nal Are a D ata S teward Coord ina tors Dat a Q ua lit y Te am - M o de le r D at a Q ual ity Team - Bu sine ss Ana lyst D ata D o m ain Ste w ar d No Yes 16. Update Glossary of Terms with CLDM Terms 2.1.3 14. Validate Data Element List and Conceptual Model 8. Conduct Subject
Area-Focused JADSession 11. Update Conceptual Data Model
10. Document Data Standards &Rules Findings 2.1.2.4 17. Validate Glossary of Terms Glossar y o f Ter ms Te mp late DQ R ules & Standards Te mplate 9. Document Business Definition Findings
Glossary ofTem plateTerms DQ Rules &
S tan dards Te mplate Gl ossary ofTem plateTerms
1.3.1 12. Update
Subject Area List 15. Initiate CLDM MaintenanceProcess 2.1.2.7
JAD Sess ionGuide
13. Have All Subject Areas Been Sufficiently Explored? 15. Initiate CLDM MaintenanceProcess
6. Create JAD Session Guide and Draft Element List
2.1.2 Build Common Definition
LO B/ Functio na l Are a D ata S te ward C oordin ato rs D ata Q ua lity Te am - M ode le r D at a Q u ality T eam - B us in ess A nal ys t D ata D om ai n S te war
d Participant Time5. Obtain J AD SessionGuide 3.Validate DomainScope
Scope Sum m aryTem plate
2.1. 1.3 2. Determine Domain Boundaries 4. Create Conceptual Data Model Scope Summa ry Tem plate Data Governance Council Initiates Domain Definition 2.1.2.8 CL DM 7. Prepare Pre-JAD Session Communications 1. Create Domain Boundaries Draft Subject Area List
2.1.2.11 7. Verify DQ Rulesand Standards 4. Create Draft DQ Rules and Standards 2. Gather Information on Data Elements 8. Validate DQ Rules & Standards 1. Review
Elements in DQ Rules & Standards
2.1.3 Build Data Quality Rules and Standards
Da ta Q uality Tea m- Busi ness A n alyst
9. Capture Dashboard/Scorecard/Reporting Requirement s/ Scope LO B/ Fun ct iona l Area Dat a Ste war d Coo rd inato rs Data D om ain Ste wa rd Da ta Q u ality Sco re ca rd Team Yes No Yes No DQ Rules andStandards
Template
6. Conduct Additional JAD Sessions orMeetings DQ Rules andStandards
Template DQ Rules and
Standards Template 10. Create Data Quality DashboardMock-Up 2.1.2.14
DQ Rules andStandardsTemplate DQ Rules andStandards
Template
1.3.1 2.3.2
DQ Rules andStandards Template 12. Mock-Up Meets Needs? 5. Is More SME Input Needed? 3. D etermine Applicat ion Instancesof Data El ement s DQ Rules andStandardsTemplate
11. Validate Dashboard Mock-Up
DQ Rules andStandardsTemplate Extract
Extract Extract Extract Extract Extract
© 2012 IBM Corporation
10
Results from good Information Governance
Understand your information
– Know what exists – How is it related
– Ensure common understanding and definitions
Contain costs
– Manage costs with continuous growth
– Retain information without growing retention costs
Maximize value from your information
– Make decisions that you can trust – Increase revenues
– Reduce costs
Secure and Protect
– Keep information safe from internal and external threats – Know who is accessing what information and why
Comply with regulatory requirements
– Retention – Security – Filings – Audits
© 2012 IBM Corporation
11
Good governance requires process and accountability
IBM Information Governance Unified Process
Optional Steps Required Steps 1) Define Business Problem 2) Obtain Executive Sponsorship 3) Conduct Maturity Assessment 4) Build Roadmap 6) Build Business Glossary 7) Understand Data 8) Create Metadata Repository 5) Establish Organizational Blueprint 9) Define Metrics 11) Govern Master Data 10) Govern Data Quality 14) Govern Big Data 13) Govern Security & Privacy 12) Govern Lifecycle of Information 15) Measure Results
© 2012 IBM Corporation
12
In Context
Real-time delivery of relevant information when and where it’s needed
Complete
Related information reconciled into a single and holistic view
Accurate
Complex and disparate data transformed, cleansed and delivered
Insightful
Derive meaning from information challenges
Align business and IT objectives using single platform that creates
trusted information for use in key initiatives
Sources Business Initiatives legacy apps dbs Xls., xml, flat warehouse z/OS custom BI SAP Warehouse MDM Business Analysts Executives Enterprise Architects Data Analysts &
Architects Subject Matter Experts
ERP System Manager Developer DBA System Architect Data Steward App Consolidation
© 2012 IBM Corporation
15
Example business case for data quality in marketing
A. Total number of customers in the marketing list 950,000 B. Number of individual party matches 40,000 C. Additional duplicate individuals who are double-counted as part of
a household 50,000
D. Total number of duplicate matches 90,000
E. Number of annual marketing mailings per customer 2
F. Cost per mailing $3.25
G. Total avoidable cost of duplicate mailings (DxExF) $585,000 H. Outbound telemarketing calls per customer per year 4 I. Cost per outbound telemarketing call $1.50 J. Total avoidable cost of outbound telemarketing calls (DxHxI) $540,000 K. Total avoidable cost of duplicate matches (G+J) $1,125,000 L. Cost to implement data quality tools $500,000 M. Annual Cost of full-time customer data steward $200,000 N. Total cost of data quality solution (L+M) $700,000
© 2012 IBM Corporation
16
Put the right standards in place
Optional Steps Required Steps 6) Build Business Glossary 7) Understand Data 8) Create Metadata Repository 9) Define Metrics 15) Measure Results
© 2012 IBM Corporation 17 IT Architect Marketing Manager Support Rep CRM Project Manager Business Intelligence Manager ERP Project Manager Business Analyst Financi al Officer Compliance Officer Sales Lead
For example, define
“Active Subscriber”
Mobile user who has used “any”
service in the mobile network
User who paid for the service at
least 1 time in the past 90 days.
Mobile user who has a phone plan,
but not SMS
Only post-paid customers, not
pre-paid customers
User who makes at least 1 call
over the period of 90 days
© 2012 IBM Corporation
18
Understand your information
Data can be distributed over
multiple applications,
databases and platforms
Relationships are complex
and poorly documented
Relationships are not well
understood
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
© 2012 IBM Corporation
19
Determine lineage of data
Credit Card Number: “a unique identification number issued to each card holder and unique to each card printed.”
1212 4545 6525 3092 1212454565253092
0000000085426938
Profit Amount: “a currency value that is calculated by combining data from the Customer Master database and Wholesale
Inventory applications . . .”
Calculation included on monthly report $85,426,938
• View end-to-end lineage
including design metadata,
operational metadata, user-defined metadata
IBM Data Quality
Value
Life Cycle
Assess &
Discover
Specialized
validation
Cleanse &
Enrich
“Master”
Monitor /
Track
Align with Business Objectives
report & deliver insight
Shared metadata, connectivity & infrastructure
InfoSphere Information Analyzer
validate & monitor
InfoSphere QualityStage
cleanse & enrich
Identify data quality issues early to reduce
project risks
Monitor quality metrics over time for
compliance Create business confidence with trusted
information Perform data quality
assessment
Define business rules to monitor data quality Establish stewards for
governance of data quality
Requirements
Benefits
Analyze source data quality and monitor adherence to integration and quality rules
Applying Information Analyzer
The solution perspective in a variety of use cases
Gov
ern
Analyze Integrate External Sources Master Data Management Inform at ion D el ivery BI Applications Packaged Applications … Inter nal Dat a Sou rce s Information Analyzer Packaged App. Data WarehouseMonitor quality at the source to address issues
where information originates
Monitor your trusted systems and their consistency with sources through transformations
Report status & progress to the business
Supply metrics to governance initiative
Define (bus.-driven)
Test Deploy
Data Quality: Pervasive, Progressive, Continuous
Information Analyzer supports the full spectrum across all levels
Business Measured Generic Business Driven Business Aligned Common Measurement s Data Rules Metrics DQ Dashboard + Reports
tax-id field:
many Nulls
Rule: tax-id not Null And not default
Common Data Quality Dimensions and Measurements
Domain quality: completeness, validity, length & format
Cross-domain fitness – Redundancy
Data Rules
Specify consistent & re-usable data rules, driven by business
Examples of Rules:
The Gender field must be populated and must be in the list of accepted values The Social Security Number must be numeric and in the format 999-99-9999
If Date of Birth Exists AND Date of Birth > 1900-01-01 and < TODAY Then Customer Type Equals ‘P’ The Bank Account Branch ID is valid in the Branch Reference master list
“The account number must meet
the following condition: …“ Business users Data Rule driven by validated against
Measure results vs. targets
View Metric & Benchmark summaries
Organize Metrics and Rules within user-defined folders Create Metrics across single or multiple Data Rules
Comprehensive reporting and tracking environment
From high level dashboard to flexible views
Quickly assess the health of your information in summary dashboard view
Drill into specific data quality assessment results Understand the details in multiple perspectives and based on flexible configuration
Validating Data Rules in InfoSphere QualityStage/DataStage
Embed Information Analyzer Data
Rule Definitions in DataStage/
QualityStage jobs
Create new data rules through the DataStage /
QualityStage Designer
Enables an integrated and comprehensive development
environment across QualityStage, DataStage and Information Analyzer
Removes duplicates
Cross-references matching records
Survives a single, complete record
Validate and enriches data
Highly accurate for fast ROI Resolution of data quality issues Standardization of data formats Cleanse data
Manage duplicate data
Enable ongoing quality
Requirements
Standardize, cleanse and deduplicate data, ensuring a
complete, accurate view of information
Benefits
© 2012 IBM Corporation
30
IBM InfoSphere – Your Trusted Platform for Trusted Information
Intelligent Prebuilt, Automated, Proactive Integrated Integrated capabilities designed to address enterprise use cases
•
ComprehensiveCovering the full information supply chain
InfoSphere is a market leader in every category of Information Integration
and Governance
© 2012 IBM Corporation
31
Next steps in Information Governance
IBM Information Governance Council
– Established Information Governance Council over five years ago
– Developed Maturity Model for Information Governance leveraged by over 250
customers
– Community now exceeds 1500 members – Join the community
www.infogovcommunity.com
– Self assessment
Workshops and assessments
For more information www.ibm.com/ informationgovernance
BACKUP
WHAT IS NEW IN INFOSPHERE
INFORMATION SERVER FOR
Key Data Quality Enhancements
validate cleanse & enrich “master” monitor / track assess & discover define objectivesNew Information Governance Rules & Policies New Standardization Rules Designer New Data Quality Console New Address Verification Module Extended platform support
Data Validation Rule Impact Analysis
Data Validation Rule Sequencing
Data Validation Rules
Flexible Output Table Configuration, Sequencing & Impact Analysis
Flexible configuration of output tables for Data Validation Rules (naming, append/overwrite) Registration & reuse of output
tables
Sequencing of Data Validation Rules
Advanced web-based Data Validation Rule display incl. lineage and impact analysis
Data Validation Rules
User named output table configuration & sequencing
Define name of output tables for DataValidation Rules
– Simple user-named tables: single table for single rule
– Advanced user-named tables: one or more rules can update the
same table
(common format required)
– Configure whether to append / overwrite values in output table Workflow example: Data Validation Rule 1 Output Table 1 Source DB Data Validation Rule 2 Data Validation Rule 3
Data Validation Rules
Search, browse and view Data Validation Rules & associated assets
User may browse, search and display details of published Rule Definitions, including usage by DataStage and Glossary assignments.
Data Validation Rules
View Data Validation Rules in lineage displays
Stage details includes reference to Data Rule Definition and changed Rule Logic. Lineage displays data flow through the Job.
Data Validation Rules
Drill down from job level to Data Validation Rules details
Expanding the details of a Job, will preview the data flow within the Job. Data is pushed into and out of the Rule Stage via its connecting links.
Capabilities
Superior GeoCoding support for 240 countries / territories
Improved verification, suggestion and correction results
Bi-directional Transliteration support
Tightly integrated into QualityStage
Supports for most Information Server versions
Extensible framework to support other features in the future such as Address Certification
Benefits
Reduced errors in shipping, mailing, and other activity resulting in lower cost
Better customer service and increased revenue
Increase business confidence when using enterprise data for critical decision making
New Address Verification Module
New InfoSphere Data Quality Console
Unified environment to proactively increase Data Quality awareness
Information Analyzer Discovery / Information Analyzer Exception Manager QualityStage Business Analyst
Data Analyst DQ /ETL Developer
Steward DataStage validate cleanse & enrich “master” assess & discover define objectives report monitor / track
New InfoSphere Data Quality Console
New InfoSphere Data Quality Console
New InfoSphere Data Quality Console
Assigning ownership for exception summaries
Back up
New InfoSphere Data Quality Console
View summary / metadata information for Data Validation Rules and exceptions
Back up
New InfoSphere Data Quality Console
New InfoSphere Data Quality Console
New InfoSphere Data Quality Console
New InfoSphere Standardization Rules Designer
Simplifying & accelerating the speed of cleansing data
validate “master” monitor / track assess & discover define objectives cleanse & enrich
Knowledge holders
looking at the data
…
what they want to
New InfoSphere Standardization Rules Designer
Data driven standardization when cleansing data
Intuitive framework to design, maintain and execute standardization rules for data quality Web based user interface allows users to quickly begin the Classification process by
changing or adding value definitions to their data.
Drag and drop features allow users to easily manage rules that handle their records without needing to hand write any pattern action language (PAL) code.
IBM Data Quality – other features and enhancements
Discovery
– Complete globalization
– RHEL + AIX support for engine (client: Windows) – 64-bit
– Enhancements for life cycle & test data management (vol. projection) Information Analyzer