• No results found

ISSUES AND OPPORTUNITIES FOR IMPROVING THE QUALITY AND USE OF DATA WITHIN THE DOD

N/A
N/A
Protected

Academic year: 2021

Share "ISSUES AND OPPORTUNITIES FOR IMPROVING THE QUALITY AND USE OF DATA WITHIN THE DOD"

Copied!
7
0
0

Loading.... (view fulltext now)

Full text

(1)

I

SSUES AND

O

PPORTUNITIES FOR

I

MPROVING THE

Q

UALITY AND

U

SE OF

D

ATA

W

ITHIN THE

D

O

D

T

HE

M

ISSION

:

E

XPLORE AND ADDRESS THE SPECIFIC MEANS

TO ASSESS THE IMPACT AND MAGNITUDE OF DATA QUALITY

.

In short: How does the Department of Defense ensure that the right data are in the hands of the right people at the right time? This is a question that has always been an issue, but in recent decades, as data become more distributed and available, in many more media and forms, it has become particularly significant. The specific means to assess the impact and magnitude of data quality remain unexplored and unaddressed.

The goals of today’s effort are to

• identify root causes of problems with data

• address the need for determining the provenance of data and making it available to users

• investigate the extent to which data collection, data analysis, and reporting

infrastructure requirements enable and hinder the production of high quality data and information

• characterize ways to maximize the use of data (for example, data-mining methods and improved query-posing) to support decision making

• identify potential technical solutions to the problems identified during the workshop

IDENTIFY ROOT CAUSES OF PROBLEMS WITH DATA

There are three major categories of data that are the proper subject of a data quality effort: Operational Data, Reference Data, and Decision Data.

Operational Data are the transactions that are incurred during the course of carrying out the Department’s activities. These may be concerned with procurement, deployment, or other areas of activity. These are the fundamental data that are the record of what the Department does. A transaction’s data do not change once they are recorded.

Reference Data are the lists data that describe the terms of reference for the operational data. These may be large lists, describing products or personnel, or short lists describing the values for status, product colors, and the like. These data do change over time, and

(2)

maintaining them in synch with the operational data is a major assignment for the data quality effort.

Decision Data – are the summarizations and other analyses that constitute the information used by officers to make decisions. These are typically distillations of operational data, organized in terms of reference data, and further massaged to identify patterns, trends, and the like.

Managing the quality of these three categories requires different specific processes, but the principles of maintaining data quality apply across the three.

Issues

Your author is not in a position to describe the specific data quality problems of The Department of Defense. It is likely, however, that they can be characterized as follows:

• The large, dispersed and diverse organization structure has resulted in the development of “silos”, where each functional area has optimized its efforts to address the specific problems of that area. Thus systems for communicating between areas are massively complex.

• Part of the problem with systems communicating is that the people involved have different vocabularies.

• Measuring the quality is difficult, even though everyone knows when it’s bad. • Determining the source of problems is difficult.

• Addressing those problems is difficult.

This essay is based on three premises:

1. Managing the Departmental asset that is its data and resulting information—is fundamentally the responsibility of the Department’s management, just as they are responsible for human resources, money, and physical capital. The abstractness of this asset, however, makes managing it quite different from managing the other, more tangible assets. .

2. The value of data in databases is potential only. Data only have value when they move and become information.

3. Data have structure. The true, underlying structure of the concepts described by the Department’s data is considerably simpler than most of their embodiments in real databases. That being so, understanding this simpler underlying structure is essential to:

(3)

 Design more coherent ways to manage and. store data.

 Design more coherent processes to transmit and monitor the quality of data.

ADDRESS THE NEED

This exercise is concerned with determining the provenance of data and making high quality data available to users.

NOTE: As described in Premise 1, this is not an Information Technology Department assignment. Information Technology specialists are essential to implementing solutions, but the complete solutions are not technological. They have to do with the way the information resource is managed.

Here are some essential steps:

Get enough of the right people involved

Who the right people are depends on the particular circumstances of the part of the Department being addressed. Who they should not be is a team of three people from the bowels of the Information Technology Department. The people assigned to the project must have visibility in the Department, as well as authority to make required organizational and procedural changes.

Emphasis must also be placed on the customers of the data, who require the data to do their jobs, and the producers of the data who will ultimately be held accountable for the data’s quality. This also includes the owners of each of the processes involved in transmitting the data from the producers to the customers.

Overseeing the requirements of the producers and customers is the data steward. This is the person who is ultimately responsible for the correctness of a specific sub-set of the data. This person may be a producer of the data, or may be simply responsible for data

production. More significantly, this is the person customers of the data can call on if there is a problem.1

Data modelers and architects must be involved to assure that the data are organized in a coherent way that reflects the underlying structure of the organization.

Tom Redman recommends establishing a Chief Data Officer and a Data Council “to formulate, lead, coordinate and push the effort.”2

1 See Bracket, pages 212-213 for an extensive discussion of the data steward role.

He cites the example of a major securities

(4)

company that has created such a “CDO”. In this case, the individual involved “views his role as being the architect and coach of the enhanced data management system. He is helping define and approve data policies, developing an efficient and cost-effective operating model, providing governance, and overseeing the convergence of technology platforms and

distribution of data.” 3

The Data Council comprises the organization’s most senior people who are willing to review and approve policies. “The council can be a new group, or, preferable, new responsibilities for an existing senior leadership body, such as the operations committee.”

Involve business strategy

Success in any data and information improvement project in the Department of Defense requires it to be intimately tied to the Department’s overall strategies: Moreover, the effort must be based on the Department’s goals and objectives. Do the data being addressed contribute value to the Department’s meeting those goals and objectives? If not, don’t bother.

In some cases, the strategy itself may be affected by the new opportunities that may be uncovered by better access to information.

Assess quality

Once the team has been defined, the first order of business is to assess the quality of data and information currently available. This begins with asking top officers and civilian managers what data are critical to their most important decisions. Then each of the stakeholders in the enterprise’s processes (both sources and customers of data) is asked about the data he or she uses and the data (and information) produced. Ultimately the quality of data must be determined by their customers, but it is also important to understand all that goes into their creation.

Data are representations of facts about things4. These are the numbers and characters that

describe the world. Data quality is a measure of the “correctness” of a body of data, in terms of the following.5

Definition integrity – This is a measure of the extent to which instances of a particular attribute are compatible with the definition of that attribute.

3 See Redman, page 197-199.

4 The word “Data” is the plural form of the word “datum”, which originally meant something (an

assertion, that is) that is given.

(5)

• Duplication – This is a measure of data’s being duplicated within or across systems. E.g., what percentage of customer records are duplicates of others.

• Timeliness and availability – This is a measure of how current data presented for use are—relative to the requirements for that use. In some cases, data must be available in a split second. In other cases it is acceptable for the data to be delivered within a day after they are created.

• Data Decay – This is a measure of the rate at which the relevance of data decay over time.

Information is the result of interpreting one or more data and putting them in context. Note that much of the information we have comes from data, but some does not.

Information quality is a measure of how accessible and appropriate a body of data is for its designated use. It is dependent on the development of a data architecture that allows understanding of the context of each datum. This includes both the underlying way data are organized as well as the way they are presented to the customer.

Interestingly enough, data architecture and data presentation require two very different orientations:

• Data are most effectively managed if their structure is in terms of universal concepts. That is, the particular things seen by a particular customer are understood to be but examples of a more general concept. This approach to data structure means that as the particulars of a business situation change, these changes can be addressed with changes to configuration data values, without having to change the underlying structure of a model (or, more significantly, of a database).

• Presentation, on the other hand must be exactly in the terms that the customer understands and is accustomed to. Each customer has a particular “view” of the data, which tends to be in the very concrete terms of his or her daily experience. The idea is to map each presentation view to the more universal structure of the underlying structural model of the data. The dimensions of information quality, then, include:

• Data specification – This is measured in terms of the existence, completeness, quality, and documentation of data standards, data models, business rules, metadata, and reference data.

• Ease of use and maintainability – This is measured in terms of how readily data can be retrieved and used, in the context of a particular customer’s view of the world.

Presentation quality – Related to the previous dimension, this is measured in terms of how readable and understandable the data are. This includes everything from report organization to simple aesthetics.

(6)

• Perception, relevance and trust – This is measured in terms of the importance, value, and relevance of the data to business needs.

• Transactability – This is measured in terms of the degree to which data will produce the desired business transaction or outcome.6

INVESTIGATE THE EXTENT TO WHICH

• data collection, data analysis, and reporting infrastructure requirements • … enable and hinder the production of high quality data and information. This question is best answered by someone from the Department of Defense.

CHARACTERIZE WAYS TO MAXIMIZE THE USE OF DATA

Fundamentally, there are three kinds of improvements to make:

• Improve processes –Data capture can be made intuitive. Some data checks

required can be built in so that it is not possible to violate them. And for others, the sequence of events required can be intuitive enough to make them unlikely to recur. Processing issues of course can go beyond “How are documents gathered and sorted?” Should documents be more readable? Are there problems with the data collection equipment? There are many possibilities. Be open to them all.

• Improve data model – If the data are badly organized, with duplicates and poorly related tables, then errors are more easily committed. Any time the same information must be entered more than once—especially if it is by different people—there is the opportunity for one of those to be wrong. If reference data must be entered for each transaction, the opportunity for errors is increased.

If a customer doesn’t understand the underlying structure of the data, it is possible to formulate a query that returns something very different from what is expected. Worse, it can return something that is completely wrong and the customer doesn’t know that it’s wrong. In some cases this is because the data model did not correctly reflect the underlying structure of the world. In others, it is because the customer simply didn’t understand the implications of the model.

Improve training – And of course, along with improvements in either the processes or the data model, training must be an ongoing process. This includes both teaching

(7)

customers the implications of the data structure in their data model, as well as teaching the people responsible for capturing data how to do so.

R

EFERENCES

Amidon, D. M. 1997. Innovation Strategy for the Knowledge Economy. (Boston: Butterworth-Heinemann)

Brackett, M.H. 2000. Data Resource Quality. (Boston: Addison-Wesley). English, L. P. Improving Data Warehouse and Business Information Quality

Juran, J. and A. Blanton Godfrey. 1998. Juran’s Quality Handbook, 5th Edition. (New York: McGraw-Hill)

McGilvray, D. 2008. Executing Data Quality Projects (Boston: Morgan Kaufmann).

Redman, T. 2008. Data Driven: Profiting From Your Most Important Asset. (Boston: Harvard Business Press).

References

Related documents

The theoretical concerns that should be addressed so that the proposed inter-mated breeding program can be effectively used are as follows: (1) the minimum sam- ple size that

In this whitepaper primer, we outline the benefi ts and costs of Integrated Financial Planning (IFP) so that any senior fi nancial professional can determine if, how, and when

State change diagram showing the change in state after a command to sell seat 22H is given to a concert object KEY IDEA A Java programmer writes a class by specifying the attributes

Players can create characters and participate in any adventure allowed as a part of the D&D Adventurers League.. As they adventure, players track their characters’

This post is exempt from the Rehabilitation of Offenders Act (ROA) (as amended) and appointment is subject to satisfactory outcome of Disclosure (check of all criminal records)..

Funding: Black Butte Ranch pays full coost of the vanpool and hired VPSI to provide operation and administra- tive support.. VPSI provided (and continues to provide) the

А для того, щоб така системна організація інформаційного забезпечення управління існувала необхідно додержуватися наступних принципів:

As you may recall, last year Evanston voters approved a referendum question for electric aggregation and authorized the city to negotiate electricity supply rates for its residents