Leader in Data Quality and Data Integration
www.dataflux.com 877–846–FLUX
International +44 (0) 1753 272 020
A DataFlux White Paper
Prepared by:
Mike Ferguson
Data Ownership and Enterprise Data Management:
Implementing a Data Management Strategy (Part 3)
Introduction
So far in this three-part series on data ownership, I have discussed what data ownership is, why it is important, what the key requirements of enterprise data management (EDM) are and how companies can address the data management problem by standardizing on a suite of technologies, which I referred to as an EDM suite.
In this, the third and final paper in this short series, I want to look at what needs to be done from a strategy perspective to be able to establish personnel and procedures for enterprise data management, and what needs to be done in order to leverage the technologies available in an EDM suite to get maximum return on investment.
Enterprise Data Management Strategies
In the first paper of this series, we outlined three key requirements for enterprise data management. These requirements are:
• Establish a common suite of technologies for end-to-end data management • Dedicate IT personnel to enterprise data management
• Establish policies for data governance
Having looked at the first of these already in the second paper, we now turn our attention to organizational structure and data governance – concepts that are fundamental to any data management strategy.
Organizational Structures for Enterprise Data Management
One of the key appointments any company can make to help get their data under control is the position of a Chief Data Architect. This is often a position overlooked in IT and sometimes not well understood by business. If it does exist, this person must have a business mandate to cause change so that data can be brought under control. Fundamentally, the job of a Data Architect is to understand how data is used in business on an enterprise-wide basis and to formally define the data used. This individual is also responsible for setting policies and procedures for the use of that data, for maintaining data quality, and for ensuring a common consistent
understanding of what data means. Ideally, a Data Architect should have extensive experience in the vertical industry that he or she works so that they can clearly discuss data in the context of its business use. Data Architects must also have expertise in data management skills such as:
• Implementing data standards and establishing policies for developers and business users, including defining standard enterprise-wide data vocabularies • In-depth understanding of the relational model and navigating XML schemas • Data modeling and modeling techniques such as normalization and star
schema multi-dimensional modeling, as well as some fluency in the use of data modeling tools
• Logical and physical database design
• Data profiling and defining rules for data content cleanup
Companies
need a
strategy in
place in order
to get data
under control
and manage it
on an
enterprisewide
basis.
A Chief Data
Architect should
have extensive
experience in
the vertical
industry that he
or she works.
• Understanding of the requirements that regulations and legislation impose on data for the purposes of compliance
Ideally, data architects should have an enterprise-wide remit in the sense that they need to operate across all lines of business when managing data. This is especially important in setting strategy and patterns (best practices) around specific data management processes such as:
• Master data management
• Data profiling and data monitoring • Data migration and consolidation
• Data replication and change data capture • Data synchronisation
• Data federation
• Data warehousing and data aggregation • Data security
• Taxonomy design
Many companies are starting to create centralized IT expertise in business integration by creating Integration Competency Centers so that IT professionals responsible for different types of integration are able to coordinate their work. The data architect is at the center of data management, data quality and data integration and should be a key member of any integration competency center initiative. Figure 1 shows five levels of business integration. Data and metadata integration (and management) underpin and are a key piece of any business integration initiative.
The data
architect needs
an
enterprisewide
business
mandate.
It helps if
enterprise data
management IT
professionals
can work with
otherbusiness
integration
professionals in
an integration
competency
center.
Data and Metadata Integration Application Integration Business process integration
People integration User interface
Integration
Organization Structure – EDM As Part of An Integration
Competency Center To Coordinate Integration
Strategic Objectives
e.g. “Reduce operational costs”
Co-ordinate integration to achieve an objective
Co-ordination requires an Integration Competency Centre Business process management software Enterprise portal software EAI and SOA integration platforms ETL , EII, DQ, master data management & content management
Integration is happening at all these levels
Collaboration tools
Figure 1 - Five levels of business integration
Figure 2 shows how such an enterprise data management team operates. The first thing to notice is the consolidation of IT professionals responsible for data
management and data integration in operational, BI and unstructured content management systems into a single team. This enterprise data management team includes the users of the EDM suite of technologies, and this team has a responsibility to set standards and support the other IT professionals working in specific lines of business throughout the enterprise. People in this team are EDM technology experts.
EDM in An Integration Competency Centre
- A Federated Organisational Structure Is Worth Considering
Corporate Integration Competency
Centre
Enterprise Data architects
Data naming and definition standards Enterprise Data Model
Data Security
Data integration development and management policies
Master data management Taxonomy design
Integration with other technologies sponsor Data Governance Steering Committee
Content management, community taxonomy maintenance, data modelling using common data definitions, data integration templates
Executive
Business units
Dedicated EDM team in the ICC
Merge operational, BI & content management IT data integration teams
Figure 2 - How an Enterprise Data Management team should operate
Companies may
benefit from
merging
operational,
business
intelligence and
content
management
data integration
IT professionals
into one team.
As an example, if IT professionals in different lines of business wish to create data models then these models would be constructed from standard definitions and entities made available to them by the EDM team. The same applies if there is any
development of data. It is also important that such a team is backed by an executive sponsor who participates with a data governance steering committee. At the risk of being criticized for suggesting yet another steering committee, I would at least argue that in today’s climate of much tighter regulations, CFOs are the likely sponsors as they take compliance very seriously. There also appears to be no shortage of
executives lining up to participate in such a steering committee. Having a Compliance Manager on such a committee is also important.
Enterprise Metadata Management and Integration
Another key element of an enterprise data management strategy is data governance. This is about defining data standards including common data names and data
definitions (common metadata), common policies, patterns and processes for enterprise data quality and data integration development, the construction of an enterprise data model, and defining policies and processes for master data management (MDM).
Data standardization requires that a shared business vocabulary (SBV) is established. Setting up a shared business vocabulary involves identifying and defining data used in the enterprise, creating a common set of enterprise-wide standard data definitions for that data and then mapping (cross referencing) disparate data definitions to these common definitions. More specifically, the SBV involves incrementally defining a set of enterprise-wide common data names, common data definitions, common data
integrity rules, common reference data (e.g., code set values), common mappings and common transformations for all master data, transactional data, dimensional data and metrics. The SBV forms the base of an enterprise-wide data standard. It is
fundamental to the success of commonly understood data, master data management and data integration.
These common data definitions can be used in:
• Data models to get consistency across multiple models • Data integration tools (ETL and EII)
• Application integration technologies (message brokers and ESBs) • Business views of reporting tools
• XML mark-up tags
Rendering data (e.g., in XML form) using standard XML tags based on SBV data names means that data can be presented using data names that are commonly understood by users. In addition, if data is made available for consumption by applications in the same way, then the data is managed in a consistent unambiguous fashion as it travels throughout the enterprise and as it is prepared for presentation. To explain why this is important, consider Figure 3. This shows a common problem that often arises when trying to integrate Performance Management products with multiple lines of business intelligence systems to calculate key enterprise level performance metrics.
A shared
business
vocabulary of
common data
names, data
definitions and
data integrity
rules needs to
be established.
Why Do We Need An SBV? - How Do You Drill Down From
CPM Products When Metrics Definitions Are Inconsistent?
Custom built data mart App’n metadata BI tool metadata DBMS metadata Packaged analytic app data mart DBMS Packaged
analytic app’n BI Tool
Custom built data mart Revenue? metrics definition Total Sales? metrics definition Turnover? metrics definition Total Revenue
KPI metric definition CPM product Common definitions are critical to BI integration
Figure 3 - SBVs better integrate Performance Management products with multiple lines of Business Intelligence systems
If each underlying BI system has been built independently, then what happens when you have three different metrics in three different BI systems called Revenue, Total Sales and Turnover, and you want to create a Key Performance Indicator called Total Revenue? Do you think business users understand the difference between these metrics? Worse still, do you think an IT developer knows the difference? The problem here is obvious. It’s ambiguous and prone to misunderstanding. This misunderstanding can lead to erroneous reporting and opens up the door for potentially incorrect
interpretation of data and incorrect decision making.
A best practice would be to prevent this and to establish a shared business vocabulary across all BI systems and business views. This is done in some companies but not necessarily in all. Nevertheless, data standards are about preventing different development teams inadvertently introducing ambiguity. Even if common definitions are practiced across BI systems, it is very likely that the same could not be said when you head into the world of operational systems. Most operational systems have their own application-specific data names and data definitions (application-specific data vocabularies) for data that they maintain. Therefore, when you consider Figure 4, it is not difficult to see the problem caused when integrating applications directly into enterprise portals, for example, to integrate and simplify user interfaces. One
important question to ask when looking at Figure 4 (below): What is the problem with this architecture?
It is difficult to
integrate BI
systems at the
enterprise level
when they have
inconsistent
data names and
data definitions.
Why Do We Need An SBV? - Plugging Applications Into A
Portal Means Each Application Displays Its Own Data Names
WebServices \ SQL \ Custom
App App App
Portlet
Portlet
Portlet
Portlet
Portlet
Portlet
Figure 4 - SBVs provide consistency across multiple applications.
The answer, of course, is obvious. All applications present their data using their own application-specific data names (vocabulary). If the same data is used in different applications (e.g., customer data, product data, order transactions, etc.) and each of these applications use different data names for the same data, then the user has to
know this to correctly understand data presented to them on a portal page by the
different applications. Worse is when different data in different applications have the same data names and this data is presented to the user on the same portal page. In this case the user once again must be aware of the differences if they are to
accurately understand what they are examining.
In order to resolve this problem, any application-specific data rendered using
application-specific XML tags for presentation on a portal page needs to be intercepted and translated into commonly understood data names before the user sees it. This can be achieved by introducing a message broker or enterprise service bus (ESB)
technology between the application and the portal. When this happens, data marked up using application-specific XML tags can be translated at run time into common XML tags when the data appears on the screen using SBV data definitions. Simply put, the application-specific data definitions still exist but have been hidden by the introduction of message-broker or ESB software.
Similarly, if message-brokers or ESB technology is used for application integration in a service oriented architecture (SOA), then data in any messages that travel between application services as part of a business process needs to be translated from source application specific mark-up to common mark-up, and from common mark-up to destination application specific mark-up. What does all this mean? It means that business integration software needs to make use of SBV common data definitions and mappings from disparate systems to common definitions. Is this familiar? It should be – and to show you why, look at Figure 5 (below).
Figure 5 shows a remarkable coincidence when comparing application integration software (message brokers or ESBs) and on-demand data integration technology
Integrating the
user interfaces
of disparate
applications in a
portal will
highlight all the
differences in
disparate data
names across all
applications.
(sometimes referred to as enterprise information integration, or EII). EII products, part of an EDM technology suite, present virtual integrated views of disparate data in multiple underlying systems and allow these virtual views to be accessed as if the data was integrated in a database. These virtual views can be defined using common standard data definitions (i.e., using the SBV definitions). Data integrated in real-time by EII products will then render the data marked up using common data definition tags.
Common Metadata Is Needed In Data and Application
Integration Technologies To Achieve Consistency
data vocab 1 data vocab 2 data vocab 3 EII Data Integration
mapping mapping mapping
web service adapter
Composite App service EAI Application
Integration
platform Common vocabulary
App App App data vocab 1 data vocab 2 data vocab 3
mapping mapping mapping
web service adapter
all data presented in common vocabulary in the portal Composite App service WSRP
EII works by giving applications an on-demand virtual integrated common vocabulary view of disparate data
common vocabulary integrated virtual view
Figure 5 - Understanding why business integration software needs to make use of SBV common data definitions and mappings from disparate systems to
common definitions
These are the key points to remember:
• To integrate data when building a data warehouse using ETL technology, you need common data definitions for the target system, and you need to know how disparate data in source systems maps to common data definitions • To integrate data on-demand (e.g., for reporting or presenting on a portal
screen) using EII technology, you need common data definitions for the integrated virtual view of the data and need to know how disparate data in source systems maps to common data definitions in the virtual views • To manage electronic data messages as they enter the enterprise and move
data between applications, message brokers and ESB software, you need to know the common data definitions for data and how disparate data in source application systems maps to these common data definitions so that message translation can take place
• To achieve consistency across multiple performance management and
reporting tools, you need common data definitions for the BI and performance management tool business views, and you need to know how disparate data in source systems maps to the common data definitions in the business views
Data integration
technologies
also need to
make use of a
shared business
vocabulary.
Data mappings
from disparate
to common are
also needed by
multiple
technologies.
• To define master data and solve the master data management problem, the master data entities to be defined using common data definitions and then the mappings from disparate data to master data and vice-versa, you need to define this data so that master data integration and synchronization can be managed
In fact, everywhere you look you see precisely the same requirement again and again. The secret to Enterprise Data Management is therefore in the metadata. If you first capture the shared business vocabulary and all the mappings from disparate definitions to common SBV definitions, then that metadata can be provided to and shared across:
• EDM suite technologies, such as ETL tools or EII tools,
• Data modeling tools for building data models using consistent common data definitions
• MDM applications and technologies
• BI and Performance Management business views
• Message brokers and Enterprise Service Bus technologies used in application and business process integration
• Portal technology to present the data for business use.
Combine this with enterprise data quality and the data quality firewall discussed in the second paper in the series, and you can see the whole strategy for enterprise data management coming into clear focus and taking real shape. Figure 6 shows the power of common metadata when you have an SBV and know the mappings from disparate data definitions to common ones. If you have the tooling to do this work once, all the consistency needed to manage data across the enterprise stems from the same base metadata, as long as this metadata can be shared across technologies.
The secret to
enterprise data
management is
being able to
share common
metadata across
multiple
technologies to
achieve
consistency.
Operational App
Data Governance - Use Created Common Metadata To Generate Mappings For Multiple Integration Technologies To Achieve Consistency
Composite App service
EAI Application
Integration
platform Common vocabulary
Operational App data vocab 1 data vocab 2 data vocab 3
mapping mapping mapping web service adapter
all data presented in common vocabulary in the portal Composite App service data vocab 1 data vocab 2 data vocab 3 EII Data Integration platform
mapping mapping mapping web service adapter
common vocabulary integrated virtual view WSRP C R U D prod cust asset master data
Generate common vocabulary & XSLT mappings
Enterprise DQ & Data Integration
Generate common vocabulary virtual model and mappings
Common metadata historical data DW mart mart mart Operational App
Figure 6 - Data governance and the power of common metadata
Enterprise Data Quality
In addition to establishing a shared business vocabulary, an EDM strategy involves the establishment of a data quality firewall to validate data entering the enterprise via keyboard, electronic message or file. Implementing data profiling technology and data quality as a service so that it can be invoked on-demand, in-batch, on a timer-driven and event-driven basis is the way to handle this. Enterprise data quality technology is a key piece of the EDM suite of technologies outlined in the second paper of this series. Common rules defined for a shared enterprise data quality service is vitally important to consistent data validation, data repair, data matching and handling missing data.
Master Data Management
Master data management involves using the technologies in the EDM suite to solve the master data management problem. Key business entities such as Product, Asset, Employee, Customer, Supplier, etc. need to be defined using the common data
definitions of the shared business vocabulary. Data integration technologies leveraging the SBV and mappings from disparate data to the SBV definitions can then integrate master data from disparate line of business applications and persist it in a master data hub. In addition, the same metadata also allows MDM solutions to synchronize subsets of master data used in operational applications when master data is updated centrally, and supplies dimensional data to data warehouses and data marts. The data quality firewall protects the quality of master data and manages all changes to it via the keyboard, electronic message and batch files.
A shared
business
vocabulary, EDM
technologies and
a data quality
firewall are all
needed for
successful MDM.
Conclusion
An enterprise data management strategy involves getting the organizational structure right, selecting technologies that form an integrated EDM suite for handling all
metadata management and data management needs, putting controls in place, and setting up data governance processes. These things together allow the enterprise to take control of data ownership, achieve compliance and raise the bar on data quality, business practice and business confidence.
The keys to this EDM strategy require: a shared business vocabulary; metadata integration and metadata sharing across enterprise integration technologies; a data quality firewall; and master data management. We have the pieces to solve the problem. Establishing a strategy for enterprise data management will help you take back control of the data in your enterprise.