Kevin Lewis – Partner
Enterprise Data Management COE
Barb Swartz – Account Manager
Teradata Government Systems
DATA GOVERNANCE AND
DATA QUALITY
Objectives of the Presentation
•
Show that Data Governance and Data
Quality are part of a larger EDM
function
•
Provide a process framework for effective
Data Quality Management
•
Explain the role of Data Governance and
Stewardship in a Data Quality function
•
Provide advice on aligning a Data Governance program to
EDM Framework
A Path to Integrated and Trusted Information
Data
Governance
Data
Stewardship
Integrated
and
Trusted
Information
•
Data Governance –
The practice of organizing and
implementing principles, policies, procedures and
standards for the effective use of data
•
Data Stewardship -
Continual, day-to-day activities
of creating, using, and retiring data
•
Data Quality –
Ensure data is fit for its intended use
•
Data Integration –
Includes Data Acquisition (ETL/
ELT) processing to combine transaction and master data
to provide a consistent, meaningful, and trusted view of
the data across business units and subject areas
•
Data Security and Privacy –
Information security,
data privacy and regulatory compliance across data
subject areas, including monitoring and audit capabilities
•
Metadata Management –
The people, processes
and technical components necessary to ensure that
metadata is easily accessible, consistent, current,
accurate, timely and complete
•
Master Data Management –
Management of
master data domains, such as Product and Customer data,
that provide context for transactional data
•
Data Architecture –
The logical and physical data
modeling plus other activities needed to understand
People, Processes, and
Technology
Data
Integration
Data
Architecture
Quality
Data
Master
Data Mgmt
Metadata
Mgmt
Data
Security
and
Privacy
Data Governance, Data Stewardship, and
Enterprise Data Management
Data Governance provides
oversight for
Enterprise Data Management
(EDM)
Data Stewardship provides
the day-to-day business
involvement for EDM
activities
Data
Governance
Data
Stewardship
Integrated
and
Trusted
Information
Data
Integration
Data
Architecture
Quality
Data
Master
Data Mgmt
Metadata
Mgmt
Data
Security
and
Privacy
Data Quality
•
Accuracy –
data represents reality correctly
•
Completeness –
data gaps are minimized and data subjects are
covered adequately
•
Timeliness –
data is stored in system within an acceptable
time from the business event
•
Consistency –
data is defined and reported with the same
meaning and values across the enterprise
The core dimensions of
data quality are:
Data Governance determines the focus of data
quality improvements based on business value
Data Stewards provide business understanding
of assigned data subjects
Data
Governance
Data
Stewardship
Integrated
and
Trusted
Information
Data
Integration
Data
Architecture
Quality
Data
Master
Data Mgmt
Metadata
Mgmt
Data
Security
and
Privacy
Dimension Description
Conformance
Non-Conformance
Accuracy A measure of information
correctness A balance of $10,000 is stored as a balance $10,000. A balance of $10,000 is stored as a balance of $12,500. Consistency A measure of the degree of conflicts
that exist in situations with redundant data
A balance of $10,000 in the ABC system is also stored as $10,000 in the XYZ system.
A balance of $10,000 in the ABC system is also stored as $12,500 in the XYZ system.
Entirety A measure of the quantities of entities created, versus the real world or the number of actual events
All phone calls that were made were
recorded and stored for billing. Calls to a particular NPA-NNX were not recorded due to a switch profile problem. Revenue for these calls will be lost.
Breadth A measure of the amount of information captured about an object or event
All information about a specific call is captured including duration, start and stop time, origination and termination information, billing information, network information, etc.
None of the network related information for a specific call is captured. Nothing is known about how the call was handled by the network.
Completeness A measure of information caps
within a specific entity occurrence Name, age, and occupation are known for all customers. Name and age are known for all customers but occupation is known for only 50% of the customers.
Uniqueness A measure of unnecessary
information replication Customer information is stored once for each customer. Certain customers’ records are duplicated due to variations in the spelling of the name, alternate address, etc. The records are not linked in any way.
Interpretability A measure of semantic standards
being applied A date is stored as 11 June 2002 A date stored as 11062002 is interpreted as November 06, 2002. Timeliness A measure of how current a record
is All customer addresses represent the current place of dwelling. Many customers have changed their address without informing the company. Precision A measure of exactness The amount of tax due for this specific
transaction is $0.104. The amount of tax due for this specific transaction is stored as $0.10. Depth A measure of the amount of entity
of event history that is retained A complete history of orders, bills, and payments is retained for all customers. Orders, bills, and payment information is only retained for one year. Each month, the prior year records are deleted for that month to make room for the new information.
Integrity A measure of validity with respect to
another item of related information A call detail record contains a from number of (404) 240-9999. The Terminating Point Master table indicates that due to an area code split, the 240 NNX is now in the 770 NPA.