Composite Software
Data Virtualization
Five Steps to More Effective Data Governance
Composite Software, Inc.
August 2011
TABLE OF CONTENTS
EVERYBODY LIKES DATA GOVERNANCE ... 3
FIVE REQUIREMENTS FOR MORE EFFECTIVE DATA GOVERNANCE ... 4
COMMON DATA GOVERNANCE TRAPS... 5
MORE DATA REPLICATION MAKES THE GOVERNANCE PROBLEM LARGER... 5
MORE SILOS MAKES ENTERPRISE DATA GOVERNANCE HARDER... 5
ARCHITECTURE COMPLEXITY SLOWS DATA GOVERNANCE ADOPTION... 5
FIVE WAYS DATA VIRTUALIZATION ENABLES MORE EFFECTIVE DATA GOVERNANCE... 6
ACCESSIBLE DATA... 6
SECURE DATA... 6
CONSISTENT DATA... 6
QUALITY DATA... 7
AUDITABLE DATA... 7
EVERYBODY LIKES DATA GOVERNANCE
As with motherhood and apple pie, who can argue with data governance?
Business users like it because it assures critical business decisions are made based on sound data. IT likes data governance because as the organization’s data stewards, it shows they are doing a good job. Compliance officers and risk managers like data governance because it lets them sleep at night.
Yet with significantly growing data volumes, variety, variability and complexity, along with onerous new compliance requirements, enterprises are struggling to turn the concept of data governance into a reality.
This paper describes a number of these challenges that organizations attempting to implement data governance face. Further, it describes how data virtualization can successfully resolve a number of these challenges.
Data virtualization directly addresses many of the root causes that result in data governance problems. Further, data virtualization itself delivers numerous data governance capabilities by controlling access and delivery of consistent, secure high-quality, auditable data for more intelligent business decision-making.
As a result, effective data governance is easier to implement. And thus enterprises achieve their data governance objectives faster, more easily and with less overall risk.
FIVE REQUIREMENTS FOR MORE EFFECTIVE DATA GOVERNANCE
Many technical articles and white papers define data governance, so it does not make sense to include a lengthy treatment here. However, it is helpful for our discussion to identify data governance’s most critical requirements.
Data governance is a set of well-defined policies and practices designed to ensure that data is:
• Accessible – Can the people who need it access the data they need? Does the data match the format the user requires?
• Secure – Are authorized people the only ones who can access the data? Are non-authorized users prevented from accessing it?
• Consistent – When two users seek the "same" piece of data, is it actually the same data? Have multiple versions been rationalized?
• High Quality – Is the data accurate? Has it been conformed to meet agreed standards?
• Auditable – Where did the data come from? Is the lineage clear? Does IT know who is using it and for what purpose?
We will now explore how data virtualization supports data governance to enable enterprises to meet these important business requirements.
COMMON DATA GOVERNANCE TRAPS
While we may not want to admit it, traditional data integration approaches make data
governance harder. By addressing traditional data integration in a new way, data virtualization avoids or lessens the impact of the three biggest data governance traps; too much replication, proliferating silos and overwhelming architectural complexity. As a result data virtualization simplifies data governance implementations and improves the odds of data governance success.
More Data Replication Makes the Governance Problem Larger
The predominant data integration methodology during the past two decades has been to replicate and consolidate the data into another source such as a data warehouse, usually employing extract-load-transfer (ETL) technology. Copying data is intended to simplify
accessibility. But, all those extra data copies significantly complicate data quality, consistency, security, and auditability and thus make the data governance problem larger and more daunting.
On the other hand, data virtualization integrates data without replication. This allows
organizations to focus their governance on original systems of record only, rather than all the proliferating copies, cutting the amount of governance required by as much as fifty percent.
More Silos Makes Enterprise Data Governance Harder
The proliferation of data silos in the enterprise is accelerating with new “purpose-built” data stores multiplying the problem exponentially. It is not uncommon for enterprises to have hundreds of transactional information sources, operational data stores, data warehouses with multiple derivative data marts, and more. Each of these stores include several implementation options ranging from traditional relational databases such as Oracle; massively parallel
processing (MPP) data appliances like Netezza and NoSQL; and distributed solutions such as Hadoop. As a result, typical enterprises deploy multiple technologies in each category.
With an enterprisewide virtualized architectural layer sharing a common schema and reusable resources, silo complexity is dramatically reduced. When data consumers are no longer tied directly to the physical location of specific data, accessibility and consistency are enhanced. Further, a shared data virtualization layer improves quality, security, and auditability across the underlying silos, further strengthening overall data governance.
Architecture Complexity Slows Data Governance Adoption
Years of well-intentioned efforts have resulted in IT architectures so complex that they are difficult to effectively govern. Often these byzantine architectures are loaded with brittle dependencies making them almost impossible to change while the business is running. This forms a huge barrier to adoption of new data governance policies and processes.
Data virtualization decouples data consumers from data providers, replacing IT brittleness with IT agility. This agility facilitates the necessary changes when implementing effective data governance policies and processes. For example, implementing new authorization and encryption rules to improve data security is faster and easier because IT can make the authorization and encryption changes in one place--within the data virtualization middleware--and affect all of the sources middleware--and consumers in one step. Similarly, data quality middleware--and consistency changes instantiated within data virtualization can be deployed enterprisewide.
FIVE WAYS DATA VIRTUALIZATION ENABLES MORE EFFECTIVE DATA GOVERNANCE
Enterprises cannot buy data governance solutions off-the-shelf because effective data governance requires complex policies and practices, supported by software technology, integrated across the wider enterprise IT architecture. As such, it is important that enabling technologies such as data virtualization support the accessibility, security, consistency, quality and auditability capabilities required for effective data governance.
Accessible Data
It is generally agreed that as much as 80 percent of any new development effort is spent on data integration, making data access--rather than developing the application--the most time-consuming and expensive activity. Most users access their data via business intelligence (BI) and reporting applications. These applications typically rely on data integration middleware to access and format the data, before the application displays it. So, ensuring proper governance falls on the data integration middleware.
By eliminating the need for the physical builds and testing that replication and consolidation approaches require, data virtualization is more agile and cost-effective method to access, integrate, and deliver data. This agility lets enterprise provide data access faster and more easily.
Secure Data
Ensuring that only authorized users can see appropriate data and nothing more is a critical data governance requirement. This is a straightforward task for single systems and small user counts, but becomes more complex and difficult in larger enterprises with hundreds of systems and thousands of users. As a first step, many enterprises have implemented single-sign-on technologies that allow individuals to be uniquely authenticated in many diverse systems. However, implementing security policies (i.e., authorization to see or use certain data) in individual source systems alone is often insufficient to ensure the appropriate enterprise-wide data security. For some hyper-sensitive data, encryption as it moves through the network is a further requirement.
Data virtualization not only leverages single-sign-on capabilities to authorize and authenticate individuals, it can also encrypt any and all data. As such, data virtualization becomes the data governance focal point for implementing security policies across multiple data sources and consumers.
Consistent Data
Consider the following commonplace scenario: Two people attend a meeting with reports or graphs generated from the “same” data, but they show different numbers or results. Likely, they believed they were using the same data. In reality, they were each using their own replicated, consolidated, aggregated version of the data.
Data virtualization allows enterprises to prevent this scenario from occurring by establishing consistent and complete data canonicals applicable across business intelligence and analytic use cases.
Quality Data
Correct and complete data is a critical data governance requirement. However, data quality is often implemented as an afterthought to data creation and modification, usually performed during the data consolidation process. This approach impedes the achievement of good data quality across the rest of the enterprise. The modern trend in data quality and governance, however, is to push the practices of ensuring quality data back toward the source systems, so that data is of the highest quality right from the start.
Data virtualization leverages these “systems of record” when delivering data to the consumer, so it naturally delivers high-quality data. In addition, data virtualization allows data quality practices like enrichment and standardization to occur inline, giving the data stewards more options for ensuring data is of the highest quality when it reaches the consumer.
Auditable Data
On the data source side, good data governance policy requires that IT can explain where data comes from, and prove its source. On the data consumer side, good data governance policy requires that IT show who used the data, and how it was used. Traditional data integration copies data from one place to another. As a result, the copied data becomes “disconnected” from the source, making it difficult to establish a complete source-to-consumer audit trail.
Data virtualization integrates data directly from the original source and delivers it directly to the consumer. This end-to-end flow, without creating a disconnected copy of the data in the middle, simplifies and strengthens data governance. When auditing is required, full lineage is readily available at any time within the data virtualization metadata and transaction histories.
CONCLUSION
As data governance becomes increasingly prevalent in enterprise information management strategies, forward-looking organizations are deploying methods that simplify data governance.
Data virtualization not only makes data governance easier in practice, but it also shortens the time to begin achieving the data governance benefits of consistent, secure high-quality data for more intelligent business decision-making.
ABOUT COMPOSITE SOFTWARE
Composite Software, Inc. is the data virtualization performance leader.
Backed by a decade of pioneering R&D, Composite Software is the data virtualization gold standard at 10 of the top 20 banks, six of the top 10 pharmaceutical companies, four of the top five energy firms, major communications providers and the world’s largest IT organization, the US Army.
These and hundreds of other global organizations rely on Composite Software to fulfill their ever-changing information requirements with greater agility and lower costs.
Composite Software is a registered trademark of Composite Software, Inc. Copyright © Composite Software, Inc. 2011.
2655 Campus Drive, Suite 200 T / 650.227.8200 [email protected]