• No results found

Information Governance and Data Quality

N/A
N/A
Protected

Academic year: 2021

Share "Information Governance and Data Quality"

Copied!
54
0
0

Loading.... (view fulltext now)

Full text

(1)

© 2012 IBM Corporation

Information Governance and Data Quality

Stefano Mino – EU Sales Leader InfoSphere Information Integration

(2)

© 2012 IBM Corporation 2

Ensuring

Compliance

Increasing

Complexity

Declining

Quality

Protecting

Privacy

1 trillion

Connected

devices in the

world

$8.2 million

Annual loss

by average

organization due

to poor data

quality

$204

Cost per

compromised

record

$29.8 billion

U.S spending

on governance,

risk and

compliance

Most organizations continue to struggle about Data

CIOs must determine whether your Information Governance strategy adequately

reflects the relationship to your overall information management initiative.

If the relationship is unclear, or the stated goals are different, work with the business to refactor your strategy.

(3)

Organizations continue to have Data Quality Challenges

 Compliance and transparency pressures increasingly highlight data

quality issues

– No method to maintain high quality data

 Unreliable Insights

– Low data quality leads to lack of trust and results in poor business decisions – Inability to identify source of quality issues

– Unreliable insights are persisted to other strategic initiatives, which then base key business decisions on bad data

 High costs & negative customer satisfaction

– Organizations are recognizing that there can be both direct (missed revenue opportunity) and indirect (low customer satisfaction and high churn) financial costs from poor data quality.

(4)

Survey: Data quality software is viewed as critical technology . . .

79% of survey respondents indicated they had deployed their

tools of choice in more than one project or deployment, as

compared to only 58% in 2008.

2009 Survey on Data Quality Tools Highlights Broadening Deployments With Focus on Proven Functionality, Gartner, 14 August 2009/ID Number: G00170331

(5)

The Cost of Dirty Data

83% of Data Integration projects either overrun or fail

Inaccurate or incomplete data is a leading cause of failure in

business-intelligence and CRM projects

 25% of time is spent resolving bad data

Undetected defects will cost 10 to 100 times as much to fix

upstream

 Low data quality costs companies $611 billion annually

Scrap and rework Increased costs

(6)

© 2012 IBM Corporation

6

IBM Information Governance creates order out of information chaos

 Orchestrate people, process and technology toward a common goal

– Promotes collaboration

– Derive maximum value from

information

Information Governance is the exercise of decision rights to

Information Governance is the exercise of decision rights to

optimize, secure and leverage data as an enterprise asset.

optimize, secure and leverage data as an enterprise asset.

Governing the creation, management and usage of

Governing the creation, management and usage of

enterprise data is not an option any longer. It is:

enterprise data is not an option any longer. It is:

Expected by your customers Demanded by your executives Enforced by

regulators/auditors

 Leverage information as an enterprise asset to drive opportunities

– Safeguarding information – Ensure highest quality

(7)

© 2012 IBM Corporation

7

Information Governance

Govern

Quality Security & Privacy Lifecycle Standards Transactional & Collaborative Applications Business Analytics Applications External Information Sources

Success requires governance across the

“Information Supply Chain”

Analyze

Integrate

Manage

Cubes Big Data Master Data Content Data Streaming Information Data Warehouses Content Analytics

(8)

© 2012 IBM Corporation

8

What is Information Governance?

IBM Defines Information Governance as

a

holistic approach

to managing and leveraging information

for business benefits.

It encompasses information

quality

, information

protection

and

information

life cycle

management

(9)

© 2012 IBM Corporation

9

Proactively leveraging information . . .

to unlock value and manage risk

Ensure information is

understood

and consistently

defined

.

Increase the use and

trust

of information as an enterprise asset.

Protect

information,

reduce risk

and

comply.

Executive-Level Data Governance Bodies Line of Business Stewardship Community Data Quality Reporting Team Project Teams Virtual Teams Executive Sponsorship Risk Data Council

Data Governance Program Manager Technical Liaisons (4) Business Liaisons (4) Metadata Liaison (1)

Data Governance PMA

Risk Data Governance Office (DGO) Data Quality Reporting Liaison (1) Data Definition Stewardship Function Data Production Stewardship Function Data Usage Stewardship Function Quality Measurement Stewardship Function Lead Steward Executive-Level Data Governance Bodies Line of Business Stewardship Community Data Quality Reporting Team Project Teams Virtual Teams Executive Sponsorship Risk Data Council

Data Governance Program Manager Technical Liaisons (4) Business Liaisons (4) Metadata Liaison (1)

Data Governance PMA

Risk Data Governance Office (DGO) Data Quality Reporting Liaison (1) Data Definition Stewardship Function Data Production Stewardship Function Data Usage Stewardship Function Quality Measurement Stewardship Function Lead Steward 3. Identif y Domain SMEs and Stakeholders 8. Mentor Stewards 6. Recognize Data Definer, User, andProducer Stewards 2.1.1 Build Team D ata Q ua lit y Team -Busi ne ss & Data A na ly sts Data D om ai n Ste wa rd L OB/F un ctio nal A re a Data S te war d Co ord inat or Data S te ward Co mm itt ee D ata G ove rn ance C ou nci l 4. Identify SMEs for Applications Resource C hecklist Template Process M anual 5. Note Potential Data Stewards During Domain Definition R esource ChecklistTemplate Steady State: DG Council and

Data Steward Commit tee* are Established 9. Mobilize Data Stewards 2.1.2.7 End 1. Select Data Domain Steward R esource Checklist Template 2. Map Data Domain to Lines ofBusiness * If Data StewardCommittee is not yet Established, LOB Coordinators (who will eventually be on it) will servethis funct ion

7. On-Board Stewards Resource C hecklist Template

2.1.2 Build Common Definition (Continued)

LO B/ Fun ctio nal Are a D ata S teward Coord ina tors Dat a Q ua lit y Te am - M o de le r D at a Q ual ity Team - Bu sine ss Ana lyst D ata D o m ain Ste w ar d No Yes 16. Update Glossary of Terms with CLDM Terms 2.1.3 14. Validate Data Element List and Conceptual Model 8. Conduct Subject

Area-Focused JADSession 11. Update Conceptual Data Model

10. Document Data Standards &Rules Findings 2.1.2.4 17. Validate Glossary of Terms Glossar y o f Ter ms Te mp late DQ R ules & Standards Te mplate 9. Document Business Definition Findings

Glossary ofTem plateTerms DQ Rules &

S tan dards Te mplate Gl ossary ofTem plateTerms

1.3.1 12. Update

Subject Area List 15. Initiate CLDM MaintenanceProcess 2.1.2.7

JAD Sess ionGuide

13. Have All Subject Areas Been Sufficiently Explored? 15. Initiate CLDM MaintenanceProcess

6. Create JAD Session Guide and Draft Element List

2.1.2 Build Common Definition

LO B/ Functio na l Are a D ata S te ward C oordin ato rs D ata Q ua lity Te am - M ode le r D at a Q u ality T eam - B us in ess A nal ys t D ata D om ai n S te war

d Participant Time5. Obtain J AD SessionGuide 3.Validate DomainScope

Scope Sum m aryTem plate

2.1. 1.3 2. Determine Domain Boundaries 4. Create Conceptual Data Model Scope Summa ry Tem plate Data Governance Council Initiates Domain Definition 2.1.2.8 CL DM 7. Prepare Pre-JAD Session Communications 1. Create Domain Boundaries Draft Subject Area List

2.1.2.11 7. Verify DQ Rulesand Standards 4. Create Draft DQ Rules and Standards 2. Gather Information on Data Elements 8. Validate DQ Rules & Standards 1. Review

Elements in DQ Rules & Standards

2.1.3 Build Data Quality Rules and Standards

Da ta Q uality Tea m- Busi ness A n alyst

9. Capture Dashboard/Scorecard/Reporting Requirement s/ Scope LO B/ Fun ct iona l Area Dat a Ste war d Coo rd inato rs Data D om ain Ste wa rd Da ta Q u ality Sco re ca rd Team Yes No Yes No DQ Rules andStandards

Template

6. Conduct Additional JAD Sessions orMeetings DQ Rules andStandards

Template DQ Rules and

Standards Template 10. Create Data Quality DashboardMock-Up 2.1.2.14

DQ Rules andStandardsTemplate DQ Rules andStandards

Template

1.3.1 2.3.2

DQ Rules andStandards Template 12. Mock-Up Meets Needs? 5. Is More SME Input Needed? 3. D etermine Applicat ion Instancesof Data El ement s DQ Rules andStandardsTemplate

11. Validate Dashboard Mock-Up

DQ Rules andStandardsTemplate Extract

Extract Extract Extract Extract Extract

(10)

© 2012 IBM Corporation

10

Results from good Information Governance

 Understand your information

– Know what exists – How is it related

– Ensure common understanding and definitions

 Contain costs

– Manage costs with continuous growth

– Retain information without growing retention costs

 Maximize value from your information

– Make decisions that you can trust – Increase revenues

– Reduce costs

 Secure and Protect

– Keep information safe from internal and external threats – Know who is accessing what information and why

 Comply with regulatory requirements

– Retention – Security – Filings – Audits

(11)

© 2012 IBM Corporation

11

Good governance requires process and accountability

IBM Information Governance Unified Process

Optional Steps Required Steps 1) Define Business Problem 2) Obtain Executive Sponsorship 3) Conduct Maturity Assessment 4) Build Roadmap 6) Build Business Glossary 7) Understand Data 8) Create Metadata Repository 5) Establish Organizational Blueprint 9) Define Metrics 11) Govern Master Data 10) Govern Data Quality 14) Govern Big Data 13) Govern Security & Privacy 12) Govern Lifecycle of Information 15) Measure Results

(12)

© 2012 IBM Corporation

12

(13)

In Context

Real-time delivery of relevant information when and where it’s needed

Complete

Related information reconciled into a single and holistic view

Accurate

Complex and disparate data transformed, cleansed and delivered

Insightful

Derive meaning from information challenges

(14)

Align business and IT objectives using single platform that creates

trusted information for use in key initiatives

Sources Business Initiatives legacy apps dbs Xls., xml, flat warehouse z/OS custom BI SAP Warehouse MDM Business Analysts Executives Enterprise Architects Data Analysts &

Architects Subject Matter Experts

ERP System Manager Developer DBA System Architect Data Steward App Consolidation

(15)

© 2012 IBM Corporation

15

Example business case for data quality in marketing

A. Total number of customers in the marketing list 950,000 B. Number of individual party matches 40,000 C. Additional duplicate individuals who are double-counted as part of

a household 50,000

D. Total number of duplicate matches 90,000

E. Number of annual marketing mailings per customer 2

F. Cost per mailing $3.25

G. Total avoidable cost of duplicate mailings (DxExF) $585,000 H. Outbound telemarketing calls per customer per year 4 I. Cost per outbound telemarketing call $1.50 J. Total avoidable cost of outbound telemarketing calls (DxHxI) $540,000 K. Total avoidable cost of duplicate matches (G+J) $1,125,000 L. Cost to implement data quality tools $500,000 M. Annual Cost of full-time customer data steward $200,000 N. Total cost of data quality solution (L+M) $700,000

(16)

© 2012 IBM Corporation

16

Put the right standards in place

Optional Steps Required Steps 6) Build Business Glossary 7) Understand Data 8) Create Metadata Repository 9) Define Metrics 15) Measure Results

(17)

© 2012 IBM Corporation 17 IT Architect Marketing Manager Support Rep CRM Project Manager Business Intelligence Manager ERP Project Manager Business Analyst Financi al Officer Compliance Officer Sales Lead

For example, define

“Active Subscriber”

 Mobile user who has used “any”

service in the mobile network

 User who paid for the service at

least 1 time in the past 90 days.

 Mobile user who has a phone plan,

but not SMS

 Only post-paid customers, not

pre-paid customers

 User who makes at least 1 call

over the period of 90 days

(18)

© 2012 IBM Corporation

18

Understand your information

 Data can be distributed over

multiple applications,

databases and platforms

 Relationships are complex

and poorly documented

 Relationships are not well

understood

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

(19)

© 2012 IBM Corporation

19

Determine lineage of data

 Credit Card Number: “a unique identification number issued to each card holder and unique to each card printed.”

 1212 4545 6525 3092  1212454565253092

 0000000085426938

 Profit Amount: “a currency value that is calculated by combining data from the Customer Master database and Wholesale

Inventory applications . . .”

 Calculation included on monthly report  $85,426,938

• View end-to-end lineage

including design metadata,

operational metadata, user-defined metadata

(20)

IBM Data Quality

Value

Life Cycle

Assess &

Discover

Specialized

validation

Cleanse &

Enrich

“Master”

Monitor /

Track

Align with Business Objectives

report & deliver insight

Shared metadata, connectivity & infrastructure

InfoSphere Information Analyzer

validate & monitor

InfoSphere QualityStage

cleanse & enrich

(21)

 Identify data quality issues early to reduce

project risks

 Monitor quality metrics over time for

compliance  Create business confidence with trusted

information  Perform data quality

assessment

 Define business rules to monitor data quality  Establish stewards for

governance of data quality

Requirements

Benefits

Analyze source data quality and monitor adherence to integration and quality rules

(22)

Applying Information Analyzer

The solution perspective in a variety of use cases

Gov

ern

Analyze Integrate External Sources Master Data Management Inform at ion D el ivery BI Applications Packaged Applications Inter nal Dat a Sou rce s Information Analyzer Packaged App. Data Warehouse

Monitor quality at the source to address issues

where information originates

Monitor your trusted systems and their consistency with sources through transformations

Report status & progress to the business

Supply metrics to governance initiative

(23)

Define (bus.-driven)

Test Deploy

Data Quality: Pervasive, Progressive, Continuous

Information Analyzer supports the full spectrum across all levels

Business Measured Generic Business Driven Business Aligned Common Measurement s Data Rules Metrics DQ Dashboard + Reports

tax-id field:

many Nulls

Rule: tax-id not Null And not default

(24)

Common Data Quality Dimensions and Measurements

 Domain quality: completeness, validity, length & format

 Cross-domain fitness – Redundancy

(25)

Data Rules

Specify consistent & re-usable data rules, driven by business

Examples of Rules:

 The Gender field must be populated and must be in the list of accepted values  The Social Security Number must be numeric and in the format 999-99-9999

 If Date of Birth Exists AND Date of Birth > 1900-01-01 and < TODAY Then Customer Type Equals ‘P’  The Bank Account Branch ID is valid in the Branch Reference master list

“The account number must meet

the following condition: …“ Business users Data Rule driven by validated against

(26)

Measure results vs. targets

 View Metric & Benchmark summaries

 Organize Metrics and Rules within user-defined folders  Create Metrics across single or multiple Data Rules

(27)

Comprehensive reporting and tracking environment

From high level dashboard to flexible views

 Quickly assess the health of your information in summary dashboard view

 Drill into specific data quality assessment results  Understand the details in multiple perspectives and based on flexible configuration

(28)

Validating Data Rules in InfoSphere QualityStage/DataStage

 Embed Information Analyzer Data

Rule Definitions in DataStage/

QualityStage jobs

 Create new data rules through the DataStage /

QualityStage Designer

 Enables an integrated and comprehensive development

environment across QualityStage, DataStage and Information Analyzer

(29)

 Removes duplicates

 Cross-references matching records

 Survives a single, complete record

 Validate and enriches data

 Highly accurate for fast ROI  Resolution of data quality issues  Standardization of data formats  Cleanse data

 Manage duplicate data

 Enable ongoing quality

Requirements

Standardize, cleanse and deduplicate data, ensuring a

complete, accurate view of information

Benefits

(30)

© 2012 IBM Corporation

30

IBM InfoSphere – Your Trusted Platform for Trusted Information

Intelligent Prebuilt, Automated, Proactive Integrated Integrated capabilities designed to address enterprise use cases

Comprehensive

Covering the full information supply chain

InfoSphere is a market leader in every category of Information Integration

and Governance

(31)

© 2012 IBM Corporation

31

Next steps in Information Governance

 IBM Information Governance Council

– Established Information Governance Council over five years ago

– Developed Maturity Model for Information Governance leveraged by over 250

customers

– Community now exceeds 1500 members – Join the community

www.infogovcommunity.com

– Self assessment

 Workshops and assessments

 For more information www.ibm.com/ informationgovernance

(32)

BACKUP

WHAT IS NEW IN INFOSPHERE

INFORMATION SERVER FOR

(33)

Key Data Quality Enhancements

validate cleanse & enrich “master” monitor / track assess & discover define objectives

New Information Governance Rules & Policies New Standardization Rules Designer New Data Quality Console New Address Verification Module Extended platform support

Data Validation Rule Impact Analysis

Data Validation Rule Sequencing

(34)

Data Validation Rules

Flexible Output Table Configuration, Sequencing & Impact Analysis

 Flexible configuration of output tables for Data Validation Rules (naming, append/overwrite)  Registration & reuse of output

tables

 Sequencing of Data Validation Rules

 Advanced web-based Data Validation Rule display incl. lineage and impact analysis

(35)

Data Validation Rules

User named output table configuration & sequencing

 Define name of output tables for Data

Validation Rules

– Simple user-named tables: single table for single rule

– Advanced user-named tables: one or more rules can update the

same table

(common format required)

– Configure whether to append / overwrite values in output table  Workflow example: Data Validation Rule 1 Output Table 1 Source DB Data Validation Rule 2 Data Validation Rule 3

(36)

Data Validation Rules

Search, browse and view Data Validation Rules & associated assets

User may browse, search and display details of published Rule Definitions, including usage by DataStage and Glossary assignments.

(37)

Data Validation Rules

View Data Validation Rules in lineage displays

Stage details includes reference to Data Rule Definition and changed Rule Logic. Lineage displays data flow through the Job.

(38)

Data Validation Rules

Drill down from job level to Data Validation Rules details

Expanding the details of a Job, will preview the data flow within the Job. Data is pushed into and out of the Rule Stage via its connecting links.

(39)

Capabilities

 Superior GeoCoding support for 240 countries / territories

 Improved verification, suggestion and correction results

 Bi-directional Transliteration support

 Tightly integrated into QualityStage

 Supports for most Information Server versions

 Extensible framework to support other features in the future such as Address Certification

Benefits

 Reduced errors in shipping, mailing, and other activity resulting in lower cost

 Better customer service and increased revenue

 Increase business confidence when using enterprise data for critical decision making

New Address Verification Module

(40)

New InfoSphere Data Quality Console

Unified environment to proactively increase Data Quality awareness

Information Analyzer Discovery / Information Analyzer Exception Manager QualityStage Business Analyst

Data Analyst DQ /ETL Developer

Steward DataStage validate cleanse & enrich “master” assess & discover define objectives report monitor / track

(41)

New InfoSphere Data Quality Console

(42)

New InfoSphere Data Quality Console

(43)

New InfoSphere Data Quality Console

Assigning ownership for exception summaries

Back up

(44)

New InfoSphere Data Quality Console

View summary / metadata information for Data Validation Rules and exceptions

Back up

(45)

New InfoSphere Data Quality Console

(46)

New InfoSphere Data Quality Console

(47)

New InfoSphere Data Quality Console

(48)

New InfoSphere Standardization Rules Designer

Simplifying & accelerating the speed of cleansing data

validate “master” monitor / track assess & discover define objectives cleanse & enrich

Knowledge holders

looking at the data

what they want to

(49)

New InfoSphere Standardization Rules Designer

Data driven standardization when cleansing data

 Intuitive framework to design, maintain and execute standardization rules for data quality  Web based user interface allows users to quickly begin the Classification process by

changing or adding value definitions to their data.

 Drag and drop features allow users to easily manage rules that handle their records without needing to hand write any pattern action language (PAL) code.

(50)
(51)
(52)
(53)

IBM Data Quality – other features and enhancements

 Discovery

– Complete globalization

– RHEL + AIX support for engine (client: Windows) – 64-bit

– Enhancements for life cycle & test data management (vol. projection)  Information Analyzer

(54)

References

Related documents

The secondary electrons (SE) and the X-rays generated by the scanned probe are collected at detectors above the sample, whilst the annular dark field (ADF or HAADF, depending

We used prospectively collected data on a socioeconomically and ethnically diverse sample of 178 young male and female children to test whether family income, parental education

The implementation of certain instruments by the local government units (Marshall’s offices) and the implementation of the Leader approach by Local Action Groups consisting of

We focus on the following themes within literature on university and higher education reforms in developing countries and/or facing crisis, namely; participation of key

According to the international experience, federal authorities can carry out six groups of functions for support of mechanisms of development of innovative

In other words, where the client is able to track and trace their angel online via the WINGS website, angels are able to share their location, milestones and any questions they

[r]

As a result, exogenously given bound tari¤ rates lead to three distinct scenarios: (i) no tari¤ binding scenario where the bound tari¤ rate exceeds the optimal Nash tari¤s and