The Utilization of Information Architecture at the Enterprise Level:

(1)

by David Loshin and Charles Roe

The Utilization of

Information Architecture at

the Enterprise Level:

AN ANALYSIS OF A 2013 DATAVERSITY™ SURVEY

(2)

(3)

1. EXECUTIVE SUMMARY

This report investigates the level of Information Architecture (IA) implementation and usage at the enterprise level. The primary support for the report is an analysis of a 2013 DATAVERSITY™ survey on Data and Information Architecture.

Modern enterprises need the ability to identify the movement of every data asset throughout the entire information stream; from creation to deletion. Yet, as the results of this survey demonstrate, such ability is not a reality for many organizations. The survey highlights some of the primary areas of struggle, including enterprise-wide terminology confusion, lack of future implementation plans, questions on what should be involved in an information architecture, use of enterprise data models, manual versus automated integration systems, MDM initiatives, consideration about Data Virtualization, and others.

Some of the noteworthy findings include:

• Close to half of the respondents said they define Data Architecture (DA) and Information Architecture as the same concept.

• Two thirds do not have a formal definition of DA or IA in their organizations.

• Only 4.2% say they have documented their DA in all systems. • More than two thirds employ DA at the physical level.

• Data modeling, data warehousing, and naming standards are the top three choices for what aspects should be addressed in DA.

• The primary goal of DA in more than three quarters of the respondents is to facilitate data and application development. • More than 40% do not have an enterprise data model.

• Half of the respondents’ organizations are still employing spreadsheets as the primary means of managing metadata.

(5)

• The most employed data services are for data integration and data access.

• Views and understanding of Data Virtualization are mixed. While it is not currently included in most DAs, there is wide support that it “should be included” in DA. Interestingly at the same time, it also has the highest percentage of respondents saying it “should not be included.”

(6)

2. REPORT DEMOGRAPHICS AND METHODOLOGY

The focus of the 2013 survey was to gain a comprehensive perspective on where Information Architecture stands within the world of enterprise-level Data Management, through looking at many of the primary areas where Information Architecture is used.

The report contained 29 questions, broken into various sections:

• General Demographics Information (three questions) • Current and Future IA Implementation (five questions) • Definitions/Levels of IA/DA (four questions)

• IA and Data Modeling (two questions) • IA and Metadata (two questions)

• IA and Data Integration (three questions)

• IA and Master Data Management (five questions) • IA and Data Virtualization (five questions)

A total of 205 participants took the survey, with some questions having more respondents than others. There were also 11 questions that allowed for more open-ended responses; they will be discussed in each relevant section, to give readers a clearer picture of what the survey respondents said in more detail about a particular question. The average number of respondents for each question was 140. The first demographics question — other than general contact information — focused on the respondent’s job function [Figure 1]. Respondents could answer more than one question if they had more than one job function; the top three responses were:

• Data and/or Information Architecture: 63.3% • Information/Data Governance: 31.1%

(7)

The next demographics question [Figure 2] asked about the respondent’s industry and covered the gamut of possibly industries from consulting (8.3%) to energy (3.9%); finance (10.2%) to healthcare (8.3%); retail (4.9%) to banking (6.3%). The top four responses for this question were:

• Insurance (16.1%) • Finance (10.2%) • Technology (8.8%)

• Consulting, Education, Healthcare (each 8.3%)

0.0% 10.0% 20.0% 30.0% 40.0% 50.0% 60.0% 70.0%

Execu1ve Management Finance management and/or repor1ng Content and/or digital asset management Applica1on Development Data and/or informa1on architecture Business intelligence and/or analy1cs Informa1on/Data Governance Documents and/or records management Corporate research and/or library Marke1ng and/or market research IT Management SoOware or System Vendor Other (please specify)

What is your Job Func1on?

(8)

The last of the demographics questions centered on company size. The largest majority of respondents came from companies between 1,000-4,999 employees (21.1%) and more than 50,000 (21.1%), with a large range of other sizes in between [Figure 3]:

(9)

Figure 3 (204 respondents) Less than 10 9% 11 – 99 4% 100 – 999 13% 1,000 – 4,999 21% 5,000 – 9,999 13% 10,000 – 49,999 19% 50,000 or more 21%

(10)

3. INTRODUCTION–INFORMATION ARCHITECTURE

AND THE ENTERPRISE

The need to have an enterprise-wide information infrastructure is not really a new revelation; the idea has been around as long as businesses have sought systematized planning to aid in market success.

Yet, the reality of such an infrastructure is something many enterprises still struggle with today. The speed, size, and complexity of the contemporary marketplace have forced enterprises to mature past the point where they can get by with ad-hoc information structures. The absence of a well-developed information architecture damages an organization’s agility in all aspects of the business cycle.

It’s All in a Term–What do we mean by Information Architecture?

Clichés about data have flourished accordingly throughout the industry for a good reason—data is the lifeblood, the glue, the bricks, and the gold of the modern enterprise. If an organization does not have a coordinated system that adeptly manages their data resources, then those resources cannot be trusted.

But what about information? Information is data put into action. Where a specific data element may exist on a server somewhere in one format or another, it has lesser business value until it is integrated with other data elements into an information package. Information is data with context. So where Data Architecture is necessary to contain and organize the manifold data resources into a manageable system, Information Architecture is necessary to combine those resources into a structure that allows the dissemination of that information to be captured, shared, analyzed, utilized, and governed throughout an enterprise, across all lines of business, within all departments, with confidence and reliability.

The term Information Architecture (IA) is replete with numerous different acronyms, closely related terminology, varying definitions, and often confusion. Some enterprises use the terms Data Architecture (DA) and Information Architecture interchangeably. Some employ Enterprise Information Architecture (EIA), Enterprise Information Management (EIM), Enterprise Data Architecture,

(11)

Enterprise Architecture (EA), or any number of others that certainly have their differences in implementation and methodology, but overall have similar ends in mind. The principal point of all of them is to leverage data and information resources reliably so that decision making, resource sharing, procedures, reporting, and other essential business functions work for the advancement of the enterprise, rather than against it.

Information Architecture provides enterprises with an end-to-end perspective of their information environment for the complete data lifecycle from acquisition and intake, to management and retiring. Information Architecture is the framework that connects the IT systems, data models, governance policies, business drivers and processes, metadata management, master data, BI reporting and analytics, data migration and integration systems, data quality, and others throughout the entire system into a coordinated and structured whole so the enterprise knows what, when, where, why, and how their data (and subsequent information) is being utilized and by whom.

Clearing up the Confusion–Definitions Are Important

In regards to this paper, we will be using the term Information Architecture to discuss the overarching system employed at the enterprise level that ties each of the architectural elements together into a unified whole. Data Architecture is the term used to delineate the movement and structure of the given data elements, at lower levels of abstraction, that will be discussed separately in each section of the paper such as data modeling, data integration, MDM, and others. We decided to specify these two terms in more detail due to the results of the survey, individual answers from our respondents, and for deeper clarification.

In the survey, two specific questions were asked: 1) if the respondents’ organizations defined DA and IA differently, and 2) if their organization had a formal definition of DA. The first question [Figure 4] demonstrates how often the terms are interchangeable in most organizations, as 47.7% said they treat them as the same concept. 30.5% said they do not have a definition of IA and only 14.6% said they define them differently:

(12)

This fact presents a rather important quandary for many organizations seeking to develop a better DA and/or IA structure: while terminology is subjective to a given organization, it is necessary to have a well-developed system of concepts so all stakeholders involved understand what is being discussed. If a particular project is seeking to work on DA, under the scope of metadata management and data modeling, then the use of DA is fine. It makes sense; the data is being architected so that it can also tie into other systems or higher levels of abstraction within the enterprise.

If an enterprise-wide IA project is also underway, yet various stakeholders are referring to it as DA, while others are calling it IA, while others may be referring to it as EIA or EA, then confusion ensues and time/resources are ultimately wasted. If such confusion exists throughout the enterprise, then even “simple” document management can become a quagmire of indecision and misunderstanding.

(13)

On the other hand, the absence of a distinction between the two phrases (Data Architecture vs. Information Architecture) in most cases suggests that most organizations’ levels of capability and maturity for either practice are still at a rather initial state.

The concept of an “architecture” for data, which until recently had only been viewed as a byproduct of functional processes, is still in its infancy. That suggests that even a rudimentary definition for either practice is a good starting place for identifying the fundamental practices associated with information management, delineating roles, responsibilities, and expectations, and establishing the means by which observance of those expectations can be measured.

The second question [Figure 5] asked if the respondents’ organizations had a formal definition of DA and what that definition was. It also asked respondents to give their own definition if they answered “no” to the question; 66.7% of the respondents answered “no.”

(14)

So in total, almost half of our respondents’ organizations define DA and IA the same within their organizations, but two thirds of them do not actually have a formal definition.

The open-ended responses (listed below) to this question further highlight this issue as a point of contention when seeking to create an overarching platform for information management within an enterprise. The definitions are across the board. There were a total of 129 open-ended responses between both the “yes” and “no” answers; the ones listed below are a small collection of some of the most poignant of them and are used to show the range of disparities (and similarities) in the usage and definition of IA and DA at the enterprise level, organized into some higher level categories:

• Policy and Requirements Management

¤ _{A mechanism for determining what information the organization}

needs to meet its business needs and how it should be provided.

¤ _{The blueprint for the life-cycle of business information that provides}

structure, control, and consistency for the data landscape. It is focused on describing how data is stored, processed, and utilized. An Enterprise Data Architecture helps arrange the strategic data requirements and the related components of the information

management solution at the enterprise level, and supports the ability to leverage data into business intelligence.

¤ _{Ensuring all data is modeled, is secure, and has an owner.}

¤ _{Embodied in the objective, “Provide trusted business information, with}

the right level of data quality, available in the right form, in the right context, and at the right time.”

¤ Yes, Information Architecture–the collection of components used to manage valuable enterprise information assets. This includes plans, policies, principles, models, standards, frameworks, technologies, organization, and processes that will ensure that integrated data delivers business value and aligns business priorities and technology.

¤ Discipline of managing information within the organization, including its structure, meaning, governance, and dissemination.

(15)

¤ _{Data Governance, Information Ownership, Data Flow, and Standards.} ¤ Data Architecture is the integrated set of business and IT data

specification models and artifacts reflecting enterprise data requirements.

¤ _{We consider data architecture to be different from information}

architecture. Information architecture is an enterprise-wide solution that assures the availability and proper care of data/information.

• Interoperability

¤ _{Crosses all of the capabilities and business processes of the enterprise}

and focuses on integrating, sharing, and reconciling disparate information “views” to enhance flexibility and growth.

• Data Management and Metadata Management Standards

¤ _{Data modeling standards for some systems.}

¤ _{Data models, metadata, metamodels, policies, rules, and standards that}

govern data, how they are stored, arranged, tested, deployed, reversed, and put to use in a database system (physical and/or virtual), and/or in an organization.

¤ _{Data Architecture is the understanding of the data used by the}

enterprise and the structuring of that data providing the foundation for access to trusted data.

¤ _{A description of the structure and interaction of the enterprise’s}

major types and sources of data, logical data assets, physical data assets, and data management resources.

¤ _{Data Architecture describes how data is organized, named, defined,}

stored, and exchanged across the enterprise. It includes the enterprise data model, data definitions, the business glossary as related to

concepts captured in data, the taxonomy and hierarchy of information, and the relationships between data domains. The Data Architecture also informs standards to which data designs for applications and data exchange should conform.

(16)

¤ _{Data modeling and database design, metadata management, data}

store and ETL strategy.

• Data Lifecycle Management

¤ _{We actually call it Information Architecture… “The purpose of}

information architecture is to provide guidance on a formal, structured set of components that will be used across the information lifecycle to create, manage, use and retire the underlying data/information assets.”

¤ _{Data Architecture is one of the pillars of Enterprise Architecture and}

refers to all aspects of creating, housing, delivering, maintaining and retiring data with the goal of managing data as a valuable corporate asset. A Data Architecture is often the design of data for use in defining the target state and the subsequent planning needed to achieve the target state.

¤ _{We don’t define the term, but we do define the accountabilities. They}

include owning data design standards and ensuring complete solutions across the data life cycle.

¤ Tracking of data from creation to expiration—including but not limited to storage, usage rules and roles and definition.

• Data Layouts and Models

¤ _{Data Warehouses and Data Marts.}

¤ _{Conceptual and Physical Data Models and Data Dictionaries.}

¤ Enterprise Data Architecture is responsible for the governance and design of database structures. Enterprise Data Architecture enables a variety of data storage and retrieval systems such as: Transactional, Reporting (Data Warehousing), OLAP. Enterprise Data Architecture promotes the following objectives and principles: Enterprise Data Reuse, Standardized Data Structures, and Efficient Master Data Management.

(17)

¤ _{Where structured and unstructured data are stored within the}

hardware and software that is known to the organization.

¤ _{Conceptual model up to defined data purpose.}

¤ _{Models, policies, rules, or standards that govern which data is}

collected, and how it is stored, arranged, and put to use in a database system, and/or in an organization.

¤ Data models for relational systems, XML schemas, and data

transformation specs for data integration projects, all following best practices for data.

¤ _{Data Architecture describes the data structures and systems in an}

organization.

• Uncategorized

¤ _{Extension of Enterprise Architecture.} ¤ _{The “road map” for the information asset.}

¤ _{A description at various levels, which formalizes the architectural}

component that manages what an organization requires in order to satisfy its mission. This includes, but is not limited to, traditional data, documents, industry data, video data, and any data about data.

As discussed above, we will be using the term Information Architecture for the overarching enterprise-wide structure that combines all of the many systems into one coordinated perspective.

IA helps communication channels between IT and business units. It facilitates the policies, processes, technologies, and procedures used to capture, store, administer, and analyze information entities, in a way that remains consistent throughout the entire information stream. It provides a systemic viewpoint of how individual data structures, systems, tools, and models interact with each other, across the entire organization. It is integral for the effective sharing of data and information elements used throughout the enterprise. It gives a solid foundation for the management of metadata resources, BI mechanisms, data quality procedures, MDM facilitation, modeling at all levels of abstraction, system migration and integration, business process management, decision

(18)

making frameworks, and any other necessary components within the information management systems of an enterprise.

Information Architecture includes, within its scope, all the elements of Data Architecture as well. You can’t really have one without the other, or the entire system breaks down, the data is unreliable, and no one in the enterprise can ultimately trust the information they use daily to complete their work.

The survey looks at each individual element in terms of Data Architecture; we are placing each of those elements under the rubric of Information Architecture.

(19)

4. ADDRESSING ENTERPRISE NEEDS–CURRENT AND

FUTURE IMPLEMENTATION

IA has been around for decades. Its earliest beginnings date back to the mid 1960s and ’70s. Its growth continued into the 1980s; but we really saw expansion of IA concepts and methods in the 1990s with the Internet.

The concept of enterprise-level information management, while not necessarily new in the 1980s, truly began to catch hold in the collective imaginations of the Data Management industry with John Zachman’s work and with his article “A Framework for Information Systems Architecture” in the IBM Systems Journal in 1987.

Today, there are several high quality IA management systems for organizations of all sizes, multitudes of white papers and articles extolling the benefits of a given system or methodology, Information Architects have viable work in many enterprises, colleges and universities offer courses in IA, and it is the topic of meetings in boardrooms worldwide.

Yet, even though IA’s benefits and usefulness to organizations are well known and quite recognized, its actual implementation often remains uncoordinated, sporadically documented, and certainly not used as effectively as it should be.

The survey addressed this issue within five questions that asked about DA within current systems, DA within future systems, aspects that are/should be addressed within the enterprise, levels where DA operates, and the goals of DA within the respondent’s enterprise.

Survey Results and Statistics

The implementation of DA within various enterprise systems is incomplete for most organizations. The results of the first question [Figure 6] highlight that, while some organizations have varying systems addressed and documented, overall most are still lacking in many areas. It asked respondents, “To what extent does your organization have a Data Architecture that addresses your existing data and systems?”

(20)

Overall, 36.9% of those responding to this question said they only had partial documentation for some systems; 36.3% said they had complete documentation for some systems; while only 7.1% said they had complete documentation:

When viewed with the following question, “To what extent does your organization have a Data Architecture that addresses your future data systems and needs?”, the challenge becomes evident. Only 39.9% have partially documented some systems; while 12.5% have complete documentation for some systems, and 9.5% have no documentation at all [Figure 7]:

Figure 6 (168 respondents) 0.0% 5.0% 10.0% 15.0% 20.0% 25.0% 30.0% 35.0% 40.0%

Complete and documented data architecture for all systems

Complete and documented for some systems Par=ally documented for some systems Informal or undocumented data architecture We have no data architecture at all

To what extent does your organiza3on have a Data Architecture that addresses your exis3ng data and systems?

(21)

These percentages must also be viewed in light of the actual number of participants responding to these questions. In both cases, between 20-25% of the participants did not provide any answers.

Clearly, most organizations are works-in progress when it comes to implementing fully developed DA documentation and system implementation and, by extension, well-structured, enterprise-wide IA platforms.

When asked about the levels that their DA architectures are operating [Figure 8] the results were quite balanced, with Physical Level being the highest percentage at 69.9% and Reference Level the lowest at 45.9% (respondents could select more than one answer):

Figure 7 (168 respondents) 0.0% 5.0% 10.0% 15.0% 20.0% 25.0% 30.0% 35.0% 40.0% 45.0%

Complete and documented data architecture for all systems

Complete and documented for some systems Par=ally documented for some systems Informal or undocumented data architecture We have no data architecture at all

To what extent does your organiza3on have a Data Architecture that addresses your future data and systems needs (eg. for Big Data, Unstructured Data, Real-‐

(22)

The last two questions of this section allowed respondents to go into much greater detail about their DA structures. Figure 9 asked respondents to select what aspects of Data Management are or should be addressed in their DA; while Figure 10 asked them about the primary goals of their DA. Figure 9 broke the answers into 20 categories, ranging from data modeling to data quality, ETL to data virtualization, data security to reference data and many others. It had three answer categories of “included,” “should be included,” and “should not be included.” The primary choices for the elements already included were:

• Data Modeling: 76.1% • Data Warehousing: 64.9% • Naming Standards: 62.1% • Database Design: 61.6%

Figure 8 (146 respondents)

REFERENCE LEVEL (EG. INDUSTRY OR STRATEGIC LEVEL)

CONCEPTUAL LOGICAL PHYSICAL

0.0% 10.0% 20.0% 30.0% 40.0% 50.0% 60.0% 70.0% 80.0%

(23)

Those considered most important to include were:

• Data Virtualization: 58.9% • Data Governance: 54.6% • Data Services/SOA: 54.5% • Data Retention: 52.3%

While those that should not be included were:

• Data Virtualization: 18.6% • Database Design: 11.9% • Data Storage: 10.4% • Business Intelligence: 10.1% Figure 9 (157 respondents) 0% 20% 40% 60% 80% 100% 120%

Data modeling Data storage Metadata Data governance Data quality Data integra<on ETL Data virtualiza<on Master data Database design Analy<cal data Data services/SOA Data security Privacy Data reten<on Data movement Naming standards Reference Data Data Warehousing Business Intelligence

Which of the following aspects of data management are or should be addressed in your data architecture?

(24)

The picture of where enterprises stand with their DA platforms really becomes sharper when looked at in terms of the primary goals of their architectures. The question [Figure 10] had 13 choices; the top five of those (they could select all that apply) were: • Facilitate data and application integration: 78.3% • Facilitate data sharing: 76.2% • Reduce cost and complexity of interfaces: 53.8% • Meet data governance objectives: 53.1% • Ensure data naming standards are complied with: 52.4% Figure 10 (143 respondents) 0.0% 10.0% 20.0% 30.0% 40.0% 50.0% 60.0% 70.0% 80.0% 90.0%

Ensure data format standards are complied with Ensure data naming standards are complied with Facilitate data and applicaBon integraBon Facilitate data sharing Improve applicaBon development quality and Improve applicaBon development eﬃciency by limiBng Guide or standardize vendor soKware choice Ensure future compaBbility Reduce cost and complexity of interfaces Ensure system performance Meet budget constraints and guidelines Encourage innovaBon in data-‐driven development Meet data governance objecBves

(25)

Analysis of Results

These results can be interpreted in the context of the emerging recognition that the data produced by functional systems is more than just a byproduct of those functions, and that the data sets can be repurposed to create business value.

The fundamental expectations for data repurposing are reflected in the highest-ranking goals of information architecture: facilitating data and application integration, facilitating data sharing, and reducing the costs and complexity of interfaces. Together, these objectives are the foundations for data interoperability, exchange, and consequently, reuse and repurposing.

At the same time, the relative “schizophrenicity” of the answers sheds light on what might be seen as an industry-wide gap in communicating what does or does not belong in an organization’s information architecture. For example, although most respondents noted that database design and data virtualization should be included within an information architecture, those same two practice areas headed the list of those that should not be included.

Lastly, one might draw a less-than-optimistic conclusion about the plans for a future information architecture. The relative populations of complete and partial documentation for addressing future data systems and needs are lower than those for addressing existing needs.

Alternatively, though, this can be seen as a positive development, in recognizing that the existing hodgepodge of capabilities will not meet future needs. This indicates awareness of business information needs and the desire to evolve the information architecture to meet future demands.

(26)

5. DA AND DATA MODELING

Data modeling is a central element to any well-developed data and information infrastructure. A data model, whether it represents the physical/logical levels or higher levels of abstraction such as conceptual or canonical, is the roadmap of data through all of the many systems used in an enterprise.

The information stream of an enterprise cannot be dependably relied upon if the data spaces are not mapped. Database tables, columns, relationships, properties, and keys must be identified at the physical level. The domain concepts, extended relationships, entity types, and attributes must be understood at the logical level. And the business relationships, associations, semantics, core concepts, and a wide ranges of others need to be effectively represented at the conceptual level.

The usage of different models certainly varies from enterprise to enterprise. Some employ business process models, application models, enterprise data models, integration models, and a wide variety of other names used within specific organizations.

There is often disagreement about the explicit use of particular models, though the overall employment of models remains the same: models, no matter the abstraction level of their utilization, provide a valuable (and necessary) alignment structure of the voluminous data and information entities within an organization.

Without viable models, then, the movement of data throughout all of the systems (whether databases, integration tools, reporting and decision making applications, metadata repositories, data warehouses and data marts, MDM platforms, business process management systems, from the lowest physical abstraction to the highest enterprise-wide conceptual level) is suspect. Mature data models help assure that data quality measures are effective, data heritage and lineage systems are trustworthy, impact analysis and change management protocols are accurate, and that common day-to-day users and consumers of that data can do their jobs.

Survey Results and Statistics

(27)

at the needed levels. The survey examined the issue of data modeling and data architecture with two questions.

The first question [Figure 11] asked about how data modeling is used within the respondent’s organization and had seven different choices. The top three were:

• Starting point for application development/modeling: 42% • Business intelligence: 11.3%

• Support application integration: 10.7%

The question also gave respondents the ability to select “other” (which 12% or 18 of them did) and then give a write-in response to the question. The primary answers in the “other” category were physical and logical database design, communication with business end of enterprise, and all the above, since respondents could only select one answer.

Figure 11 (150 respondents) 0.0% 5.0% 10.0% 15.0% 20.0% 25.0% 30.0% 35.0% 40.0% 45.0%

Other (please specify) Star:ng point for applica:on development/

modeling

Support applica:on integra:on Support a metadata repository Business intelligence Informa:on systems planning It is not used at all

(28)

The second question [Figure 12] concerned itself with the use of an enterprise data model. It asked if the respondent’s organization had such a high-level data model. The results really highlight an important aspect of this survey — most organizations either do not have such a model (43%) or only have it partially completed (40.9%). Only 8.1% have a fully completed enterprise data model.

Analysis of Results

In light of our speculation regarding the historical siloed approach to application development, it is likely that applications were developed with their own versions of data models, even when those modeled items were practically identical from system to system. Therefore, it is very positive to see many respondents indicate that data models are now being used as the starting point for application development, instead of just being one subsidiary task to the development of support for application functionality. This suggests a growing awareness of the importance and criticality of a well-defined underlying data model, to support the suite of applications across the organization.

(29)

This sentiment is reinforced by the answers to the second question, in which practically half of the respondents indicated an organizational commitment toward maintaining an enterprise data model. Yet again, though, this conclusion must be filtered through the lens of the respondent population — approximately 75% of participants provided answers to these questions.

(30)

6. DA AND METADATA MANAGEMENT

Data models provide the topographical maps of enterprise information systems and metadata is what holds everything together. Metadata is the foundation of an enterprise’s information systems; it provides assurance that any given information asset is consistent, functioning, and observable through its entire lifecycle, from creation to expiration. Quality metadata is the substructure for every CRM, ERP, BPM, MDM, DG, BI, or any other information-driven resource and system within an enterprise. Metadata management is a needed key to a successful Information Architecture strategy. Without it, the data models are potentially inaccurately mapped, reports may be incorrect, business processes may be imprecise, and the entire current of data to information transformation, migration, integration, and tracking systems of the enterprise will lack certifiable accuracy.

Survey Results and Statistics

In the topic of Metadata Management within the overall discussion of DA and IA, the survey asked two questions. The first question [Figure 13] asked respondents about the tools/technologies used to manage metadata at their organization. The question had seven choices; respondents could select all that apply. The top three answers were:

• Spreadsheets: 53.1%

• Shared business term glossary: 39.3% • Purchased metadata repository tool: 34.5%

(31)

The second question [Figure 15] enquired about the types of metadata maintained and provided three choices — operational, technical, and business. Once again, respondents could select all that apply; the top answer was technical metadata at 71.5%:

Figure 14 (145 respondents) 0.0% 10.0% 20.0% 30.0% 40.0% 50.0% 60.0%

Other (please specify) Shared business term glossary Data dicConary tool Spreadsheets Purchased metadata repository tool Home-‐grown metadata repository Wiki

(32)

Analysis of Results

The answers to Figure 14 about the methods and tools used to manage metadata are quite encouraging. In the past, it would be common that if any metadata documentation were captured, it would be within word-processing documents or spreadsheets, and would be limited to technical aspects of data-element usage (such as data-element names, data types, and lengths). The intent of this form of documentation could be seen as mostly for documentation purposes and would not generally be actionable. The emergence of more specialized tools for managing metadata can help in enabling better sharing of metadata, as well as in tighter integration with data-validation and data-integration techniques.

The fact that numerous respondents indicated using specialized tools suggests that maturing organizations seek to actualize the value of sharing, and ultimately standardizing, data-element definitions across the enterprise.

Figure 15 (130 respondents) 0.0% 10.0% 20.0% 30.0% 40.0% 50.0% 60.0% 70.0% 80.0%

Opera2onal metadata Technical metadata Business metadata

(33)

7. DA AND DATA INTEGRATION

Data doesn’t just magically become information. It must be transformed from basic data storage and collection systems into useable information resources for it to provide serviceable value to an enterprise. Such a transformation requires a complex system of data integration and migration.

There are several ways to complete the transformations, but overall the stages generally include (with many variations): initial compilation of data assets into pre-warehouse storage structures (such as repositories, CRM, ERP, flat files etc.); cleansing/transformation/classification of assets through some form of ETL or other integration system; collection of transformed assets in some type of data warehouse, data mart, ODS, or other system; and use of those assets as information in frontend BI and analytics platforms such as OLAP, data visualization, data mining, reporting mechanisms, and a multitude of others.

Such a breakdown simplifies processes undertaken in numerous ways by every enterprise; but the essential focus remains the same — enterprises must have a way to transform their data assets into information assets, which can be readily and successfully utilized by end users. Data integration and migration facilitate such transformations.

Survey Results and Statistics

The survey approached the problem of data integration with two questions. The first asked respondents about the percentage of data integration done manually versus automatically in their organizations. The top choice was “25-49%” at 25.9%, though the three top choices were all quite close, demonstrating a wide variety of integration practices within the industry [Figure 16]:

(34)

The next question asked respondents to further clarify their opinions on data integration, by examining what parts they think should be included under data integration. They had nine different choices, and could select all that apply. The top four choices in Figure 17 were:

• Data transformation: 88.9% • Data extraction: 81.9% • Data loading: 79.2% • Data replication: 61.8% Figure 16 (139 respondents) 0.0% 5.0% 10.0% 15.0% 20.0% 25.0% 30.0% 0% 1-‐24% 25-‐49% 50-‐74% 75-‐99% 100%

(35)

Analysis of Results

Automated data integration tools have been in the market for over twenty years; so many vendors have provided this capability that it has become somewhat of a commodity.

Therefore, it is surprising to see that over a third of the respondents perform half or more of their data integration manually, and that another 25% perform at least a quarter of their data integration manually.

There are two ways to interpret this result.

One is that the institutionalization of manual data transformation is ingrained within the culture, or that the cost of transitioning outweighs the benefits of automation, making it challenging to transition to the use of automated tools.

Figure 17 (144 respondents) 0.0% 10.0% 20.0% 30.0% 40.0% 50.0% 60.0% 70.0% 80.0% 90.0% 100.0% Data Access Data Extrac8on Data Transforma8on Data Loading Data Replica8on Change Data Capture Data Federa8on Data Virtualiza8on Messaging

(36)

The other is that the complexity of many data integration tasks remains high, and that even with the available tools, the level of expertise required to use automated tools, and to transition existing manual data integration tasks to automation, may hover outside the means of numerous organizations.

On a positive note, though, the results do indicate that automated data integration is employed in some way by over 90% of the respondent organizations.

The results of Figure 17 about what is to be included within data integration was quite encouraging, as most of the choices were selected by at least 50% of those responding, the exceptions being data virtualization and messaging. There is a growing movement within the Data Management industry to view data integration within the context of the end-to-end data lifecycle for any scenario in which data is in motion. One conclusion that can be drawn from this question is that, even with growing acceptance of that perspective, vendors still have a great opportunity to communicate their products’ values as part of an enterprise data integration framework (especially data virtualization and messaging).

(37)

8. DA AND MASTER DATA MANAGEMENT

Master Data Management (MDM) is no longer an up-and-coming concept in the Data Management industry. MDM has been around for years now. Many enterprises have some level of MDM, and the value of MDM is no longer in question.

If such a premise is true, then why do so many organizations still not have mature MDM programs in place?

Dependable master data is critical to business success. Master data (whether it’s customer, product, employee, supplier, financial, or any other type) is one of the foundational data building blocks that provide an enterprise with a verifiable chain of accuracy and consistency along the entire data stream. A mature MDM system allows an enterprise to reliably track customer information, sales figures, HR records, create marketing campaigns, reduce marketplace risk, increase efficiency within their supply chains, and innumerable other vital objectives for any successful organization. An immature or imprecise MDM system does just the opposite.

Survey Results and Statistics

Since the importance of MDM cannot be overstated, the survey asked a total of five questions about the MDM programs of the respondents’ organizations. When taken as a whole, these questions really underscore the need for more mature MDM systems to be implemented at the enterprise level. MDM is a vital component in a well-developed IA infrastructure.

The first question (of five) inquired about how long the respondents’ organizations have had their most mature MDM program in place [Figure 18]. The results were quite split, with “no MDM programs in place” holding the first place at a surprising 37.2% and “5+ years” in second place at 21.7%:

(38)

The next question covered the types of data currently being managed with an MDM program and allowed respondents to answer all that apply. Customer data ranked first at 76.7%, product data second at 57.8%, and employee data third at 37.1% [Figure 19]: Figure 18 (129 respondents) 0.0% 5.0% 10.0% 15.0% 20.0% 25.0% 30.0% 35.0% 40.0% 5+ years 3-‐4 years 2 years 1 year Less than a year No MDM programs in place

(39)

The third [Figure 20] and fourth [Figure 21] questions of this section where their Enterprise Data Warehouse gets its data from, and whether their organization has a golden copy of Master Data as part of their MDM architecture.

Figure 19 (116 respondents) 0.0% 10.0% 20.0% 30.0% 40.0% 50.0% 60.0% 70.0% 80.0% 90.0% Customer Data Supplier Data Product Data Employee Data Reseller/Agent/Franchisee Data Other (please specify)

What types of data are you currently managing (and planning to manage) with an MDM program?

(40)

(41)

It does not come as a surprise that most organizations use an ODS as the primary source for the Enterprise Data Warehouse, at a resounding 63.8%. It is a common architectural configuration.

The final question of this section asked about methods of MDM employed. It had five different choices (they could check all that apply); with the primary method being a consolidation hub at 44.4%, and in second place a registry/index at 39.5% [Figure 22]:

Analysis of Results

Again, we must view the answers to this set of questions in the context of the number of participants responding. This number is under two thirds of the participant pool at most, and under half when reviewing the answers to Figure 22 about MDM methodology. Figure 22 (81 respondents) 0.0% 5.0% 10.0% 15.0% 20.0% 25.0% 30.0% 35.0% 40.0% 45.0% 50.0% Registry/index Consolida:on hub Coexistence hub Transac:on hub Mul:-‐domain

(42)

There are two obvious presumptive speculations for this low response rate. Either there is no MDM activity, or there are MDM projects — and their successes are so valuable, their existence is a competitive advantage that must be carefully protected. That being said, the summary of the answers to Figure 19 about master data domains is no surprise. In most cases an organization is trying to master customers, products or both.

The relatively high percentage of respondents indicating the creation of an employee master domain indicates an interest in optimizing human resource management in different ways, such as an operational standpoint (by ensuring a unified view of the same employee across the onboarding and employee management processes) or a strategic standpoint (in awareness and management of the corporate skill sets, expertise, categorization for compliance (e.g. full-time employee vs. contractor), and geographic locations of staff members).

The question of whether the master data architecture provides a “golden copy” is interesting, in that it reflects a general focus on data consolidation and survivorship as the intent of master data management, as opposed to collaborative access to shared information about unique entity identities.

One follow-up to Figure 21, in which the respondents were asked about the existence of a “golden copy” as part of the master data architecture, would examine whether those in the 58% that did not use a golden copy do so because they originally designed their program that way, or whether they originally did use a single golden copy and then opted for alternate methods of sharing enterprise data about unique entities.

(43)

9. DA AND DATA VIRTUALIZATION

Data Virtualization (DV), while not a new concept, is still quite misunderstood. Data virtualization allows an organization to make its enterprise data easily available to business users. From a more technical standpoint, data virtualization is a form of middleware that leverages high-performance software and an advanced computing architecture to integrate and deliver data from multiple, disparate sources in a loosely-coupled, logically-federated manner. It differs from the traditional ETL/Data warehouse solutions by leaving the data in place – in the originating data sources – and extracting it as and when needed by the consuming applications. With the growth of data and complexity of IT infrastructures over the past decade or so, data virtualization is becoming ever-more important. It now can provide numerous benefits to enterprises in many different arenas. Some of those benefits include:

• Gaining more business insights by leveraging all your data –

Empowering people with instant access to all the data they want, the way they want it.

• Responding faster to your ever changing analytics and BI – Five to ten times faster time to solution than traditional data integration.

• More cost effective than data replication and consolidation – Reduces unnecessary copying of data. Data virtualization’s streamlined approach reduces complexity and saves money.

Yet, even with those benefits listed, numerous enterprises are not moving forward with virtualization technologies. In the first analysis section of this paper [Section 4–Addressing Enterprise Needs], the ambiguous nature of data virtualization in the modern enterprise was demonstrated clearly. Figure 9 had respondents answer about which elements of Data Management should or should not be included in their data architectures. DV was in first place in what “should be included” at 58.9%, and first place in “what should not be included” at 18.6%. Clearly, it remains one of the least-understood and least-utilized of the all the elements discussed in this paper.

Therefore, to help provide further clarity to this often misunderstood architectural technology, the survey asked five questions about the utilization of data virtualization at the enterprise level.

(44)

Survey Results and Statistics

The initial DV question of the survey asked what statement best represents the respondent’s organizational view in regards to DV. The top two answers really establish where DV is within the Data Management industry and why more education is necessary. The top two answers [Figure 23] were:

• We are not very familiar with DV: 32.3%.

• We know what DV is, but not considering seriously at this time: 28.6%.

When viewed in terms of the next two questions [Figures 24 and 25], the most prevalent path to DV becomes clearer. Each of the following questions was rated on a 1-5 scale, with 5 being the most likely to use or best option. The results will be shown in two separate formats, as a percentage and as a rating average. The best use of DV for the respondents’ organizations is as an Agile BI Enabler (19.8%/3.03 rating) and for Access to New Data Sources (11.8%/2.96 rating):

Figure 23 (133 respondents) 0.0% 5.0% 10.0% 15.0% 20.0% 25.0% 30.0% 35.0%

We are using or ac4vely pursuing adop4on of Data Virtualiza4on technologies We know what DV is, and are keen to learn

more about its beneﬁts and uses We know what DV is, but not considering

seriously at this 4me

We are not very familiar with DV

Which of the following best represents your company’s view of Data Virtualiza;on?

(45)

The main factors or “pain points” that are pushing the respondents’ organizations towards DV integration into their existing systems are:

• Real-Time or On Demand access to information: 26.9%/3.21 • Reduce Replication of Data/Silos: 22.6%/3.18

• Time to Market/Agility: 24.3%/3.27

Figure 24 (123 respondents) 0 20 40 60 80 100 120 140

Enterprise Strategy – Implement at enterprise / broad level to create Data Services / IaaS across

analy?cal and opera?onal uses

Agile BI Enabler – Component to add agility to BI, EDW, MDM ini?a?ves

Single View Applica?ons -‐ Support Portal, Call Center etc. ini?a?ves

Managed Migra?on – Abstrac?on Layer for managed migra?on, mergers, acquisi?ons Access New Data Sources – Integrate Unstructured, Semi-‐Structured, Web , Cloud data

more easily

How and where would you use Data Virtualiza4on? (rate each on scale of 1-‐5, 5 being most likely to use)

(46)

When asked to consider the most preferred approach to DV, respondents were given three different choices., “Best of Breed Data Virtualization Platform” had the highest percentage at 16.7% though the highest rating average was for “BI Tools with Integrated Federation Capability” at 2.90 [Figure 26]:

0 20 40 60 80 100 120

Time to Market / Agility Lower Integra;on Costs Real-‐;me or On Demand access to informa;on Reduce Replica;on of Data / Silos Complexity / Heterogeneity – Access XML, Big

Data, NoSQL, Unstructured, Web Abstrac;on -‐ Uniﬁed Business Views of Data Data Services Delivery – Secure enterprise data

sharing

What are the main factors or pain points with current integra3on approach that is driving you to consider Data Virtualiza3on (Rate 1-‐5)

(47)

The final question for this section asked respondents to rank their criteria for the selection of a DV tool. They were given seven separate choices, with a possible ranking of 1-4 (4 being the best). The top three choices (they could select more than one) were [Figure 27]:

• Pricing/Total Cost of Ownership: 39.6%/3.12

• Performance, Caching, Scalability Features: 31.4%/2.84

• Ability to Handle Structured and Unstructured Data: 28%/2.49

0 20 40 60 80 100 120

Extension to Incumbent Data Integra9on Vendors’ Products

BI Tools with integrated Federa9on Capability Best of Breed Data Virtualiza9on PlaIorm

Which approach to Data Virtualiza0on do you support more (Rate on 1-‐5 scale)

(48)

Analysis of Results

The results of these questions suggest that it is still early in the consideration and adoption of data virtualization as part of an enterprise data strategy.

For example, the scores of the different choices for Figure 24 about ways of using data virtualization were generally even. Yet, the relatively high score in Figure 24 for using data virtualization for accessing new data sources and integrating unstructured and semi-structured data, cloud data, and web data somewhat contrasts with the relatively low score for Figure 25 (about pain points and drivers for data virtualization) “Complexity/Heterogeneity–Access XML, Big Data, NoSQL, Unstructured, Web.” One might infer that accessing new (and “big”) data sources is less of a priority, and that in fact providing faster access to data in the data warehouse (“… add agility to BI, EDW” and “real-time or On-Demand access to information”) is the more critical driver for introducing data virtualization into the enterprise today.

0 20 40 60 80 100 120

Source Breadth – Access to most number of sources

Specify your most important sources (other than databases):

-‐ Ability to handle structured and unstructured data

-‐ Modeling, TransformaFon, Governance CapabiliFes

-‐ Performance, Caching, Scalability Features -‐ Data Services Publishing OpFons (SQL, Web

Services, JSON, Portlets)

-‐ Pricing / Total Cost of Ownership

Rank the Criteria for selec1on of DV tool (Rate on 1-‐4 scale)

(49)

10. CONCLUSION–FUTURE CONSIDERATIONS

The end goal of a fully functional Information Architecture is to provide a 360° perspective of all the information and data assets within a given enterprise. Yet, as has been demonstrated within this report, such an experience is not often the case for most organizations.

The reasons for such an issue are as myriad as the organizations attempting to create better architected systems, but certainly need to be addressed by each individual organization if they hope to remain competitive within their given markets. Information Architecture takes a concerted effort by the entire enterprise with a comprehensive roadmap that outlines each successive stage of its development. Some enterprises need to begin simply with developing formal definitions of the most important terminology within their various architectural systems; others need to focus more on various responsibilities and roles within their information management structures. The lack of an enterprise data model needs to be addressed by certain organizations that would benefit greatly from its creation; while others should focus more on the interoperability of their many disparate platforms before advancing forward further.

Certainly, effective implementation of MDM systems, automating data integration services, developing better decision making protocols, while lowering costs and streamlining business processes remain important for many enterprises; while some have already tackled those mountains and are moving forward into new realms. The essential point, though, is that having a well-developed Information Architecture is not just a one-off project that, once completed, just disappears and works perfectly. Information Architecture is the creation of an organizational understanding between all lines-of-business, between executives and IT, between business users and technical users, between everyone who interacts with data and information at any level, that being able to create, distribute, analyze, and consume reliable data is vital to the success of the organization.

The benefits of creating a sustainable Information Architecture are certainly well documented. The matter of implementing the necessary changes to leverage those benefits remains elusive.

(50)

ABOUT THE AUTHORS

CHARLES ROE

FREELANCE WRITER, CR SCRIBES / WRITER AND EDITOR, DATAVERSITY

Charles Roe, freelance writer & founder of CR Scribes, is backed with advanced degrees in English, History and a Cambridge degree in Language Instruction. He worked for 10 years as an instructor of English, History, Culture and Writing at the college level in the USA, Europe and Turkey. He grew up working for a family-owned business in the construction industry, has owned and operated a web design and hosting company, a photo studio, has written numerous academic papers and worked as a professional copyeditor/proofreader for close to 15 years. He spent many years after graduate school working in the high tech industry in tech support, as a database analyst for an ophthalmic software design company and a part-time server administrator. He writes on a variety of topics, including more technical topics, for a host of businesses. He writes creatively in his spare time.

(51)

DAVID LOSHIN

PRESIDENT

KNOWLEDGE INTEGRITY

David Loshin is the President of Knowledge Integrity, Inc,

(www.knowledge-integrity.com), a consulting company focusing on customized information management solutions including information quality consulting and training, business intelligence, metadata, and data standards management. David is among Knowledge Integrity’s recognized experts in information

management, contributing to Intelligent Enterprise, DM Review, and The Data Administration Newsletter (www.tdan.com) and is a channel expert for the Business Intelligence network (www.b-eye-network.com).

David’s most recent book is “The Practitioner’s Guide to Data Quality

Improvement,” and he has authored “Master Data Management” published by The MK/OMG Press in September 2008. His book, “Business Intelligence: The Savvy Manager’s Guide” (June 2003), has been hailed as a resource allowing readers to “gain an understanding of business intelligence, business management disciplines, data warehousing, and how all of the pieces work together.” David has created courses for The Data Warehousing Institute, presented at DAMA/Meta Data conferences, taught tutorials on data quality at a number of venues, and as a representative of Knowledge Integrity is often called upon to provide insights and thought leadership to the Information Management community.

(52)

CA ERwin®

Data Modeling provides a collaborative data modeling

environment to manage enterprise data through an intuitive, graphical interface. CA Technologies (NASDAQ: CA) is an IT management software and solutions company, enabling customers to manage and secure their IT environments and deliver flexible IT services.

www.erwin.com

Embarcadero Technologies, Inc.

is a leading provider of

award-winning tools for data architects, application developers, and DBAs. Its flagship data modeling solution, ER/Studio, combines business, data and application modeling into a multi-level design environment – enabling data architects to build enterprise-scale databases, and document and publish models and metadata to improve alignment between business and IT, more effectively use enterprise data as a corporate asset, and fully support compliance and data governance initiatives.

www.embarcadero.com/products/er-studio

Contact:

[email protected]

www.embarcadero.com/products/er-studio

415.834.3131 x3

(53)

Denodo

is the leader in Data Virtualization – providing unmatched

performance, unified access to the broadest range of enterprise, big data, cloud and unstructured sources, and the most agile data services provisioning and governance – at less than half the cost and several times the speed of traditional data integration. Denodo’s reference customers in every major industry have gained significant business agility and ROI by creating a unified virtual data layer that serves strategic enterprise-wide information needs for agile BI, big data analytics, web and cloud integration, single-view applications, and SOA and fully RESTful Linked Data Services. Denodo delivers complete product, training and consulting services (with partners) maximizing customer success.

www.denodo.com

Contact:

[email protected]

+1 877.556.2531

HP Vertica

is the next-generation analytics platform that enables customers

to monetize all of their data. HP Vertica’s elasticity, scale, performance, and

simplicity are unparalleled in the industry, delivering 50x-1000x the performance of traditional solutions at 30% the total cost of ownership. HP Vertica powers some of the largest organizations and most innovative business models globally including Zynga, Twitter, Verizon, Guess Inc., Admeld, Capital IQ, Mozilla, AT&T, and Comcast.

(54)

DATAVERSITY™ provides resources for information technology (IT) professionals, executives and business managers to learn about the uses and management of data. Our worldwide community of practitioners, advisers and customers participates in, and benefits from, DATAVERSITY’s educational conferences, discussions, articles, blogs, webinars, certification, news feeds, and more. Members enjoy access to a deep knowledge base of presentations, research and training materials, plus discounts off many educational resources including webinars and conferences.

(55)

(56)

The Utilization of Information Architecture at the Enterprise Level:

by David Loshin and Charles Roe