Models of Reality vs Models of Data - Data and Reality_ A Timeless Pe

One thing we ought to have clear in our minds at the outset of a modeling endeavor is whether we are intent on describing a portion of “reality” (some human enterprise), or a data processing activity.

Great discussion of this topic in Data Modeling Theory and Practice, by Graeme Simsion. That is, is data modeling really “design” or something different?

Most models describe data processing activities, not human enterprises.

They pretend to describe entity types, but the vocabulary is from data processing: fields, data items, values. Naming rules don't reflect the conventions we use for naming people and things; they reflect instead techniques for locating records in files (cf. [Stamper 77]).

Failure to make the distinction leads to confusion regarding the roles of symbols in the representation of entities, and some mixed ideas of “domain.”

Steve's Takeaways

The data model is a bridge for representing information, and at the same time sufficiently structured and simplistic as to fit well into computer technology. “Data modeling” is the process of eliciting business requirements and organizing the data to produce a “data model.” The data model becomes the artifact that can be used and reused to avoid performing the same data modeling activities again and again.

A model is a basic system of constructs used in describing reality. It reflects a person's deepest assumptions regarding the elementary essence of things.

Some information technology professionals have been so overwhelmed by the success of a certain technology for processing data that they have confused this technology with the natural semantics of information.

Looking at the same information landscape through the eyes of a particular project, and then through the eyes of the enterprise, often produces two very different pictures.

The 1975 dream of having all attributes understood and “mapped” is still not a reality for most organizations.

CHAPTER 8 The Record Model

Records provide a very efficient basis for processing data. They enable us to map out very regular storage structures. They make it easy to write iterative programs for processing large volumes of data. They make it easy to partition data into convenient units for moving around, locking up, creating, destroying, etc.

In short, record technology reflects our attempt to find efficient ways to process data. It does not reflect the natural structure of information. Senko refers to “a major commitment to particular restrictive representations like the arrays of scientific computation, the extensional aspects of set notations, the n-tuples of relations, the cards, records, files, or data sets of commercial systems and the static categories of natural language grammars. Each of these representations has great merit for its original area of study, and in turn it has made major contributions to the study of information systems. Nonetheless, each provides only an approximate fit to the evolving, heterogeneous, interconnected information structures required to model real world enterprises” [Senko 75b]. Sowa observes: “Historically, database systems evolved as generalized access methods. They addressed the narrow issue of enabling independent programs to cooperate in accessing the same data. As a result, most database systems emphasize the questions of how data may be stored or accessed, but they ignore the questions of what the data means to the people who use it or how it relates to the overall operations of a business enterprise” [Sowa 76].

Record technology is such an ingrained habit of thought that most of us fail to see the limitations it forces on us. It didn't matter much in the past, because our real business was record processing almost by definition. But we want to approach the logical model a little differently. We want it to reflect information, rather than data processing technology. When different applications deal with the same information using different record technologies, those differences shouldn't clutter up the logical model. (And we might want to consider the possibility of future data technologies that are not so record oriented.)

The logical data model is so valuable because it is technology- independent, and therefore can be used many times as a starting point for multiple physical data models. Even with technologies that are not relational such as NoSQL databases, the logical data model still applies.

When I use the term “record,” I have in mind a fixed linear sequence of field values, conforming to a static description contained in catalogs and in programs. A record description consists largely of a sequence of field descriptions, each

specifying a field name, length, and data type. Each such record description determines one record type.

One field (sometimes a combination of several fields) is often designated as the key, whose values uniquely distinguish and identify occurrences of this type of record.

As far as the system is concerned, a field name signifies a space in the record occupied by data in a certain representation. Any other semantic significance of the field name is perceived only by the user.

Some record formats allow a certain variability by permitting a named field or group of fields to occur a variable number of times within a record (i.e., as a list of values or sets of values). I will use the term normalized system to refer to systems that do not permit repeating groups or fields. This follows from the relational model, which excludes such repetitions via its normalization requirements (specifically, first normal form; [Codd 70], [Kent 73]).

First normal form (1NF) requires that all attributes on the model contain only a single piece of business information, and that the same attribute can appear once, at most, in an entity. For example, in the two models on the facing page, the one on the left is not in 1NF. This is because there are three phone numbers, so the ‘same’ attribute appears three times. Also, Customer Name contains both the first and last name of the customer, so Customer Name includes two pieces of information instead of just a single piece. The model on the right is in 1NF because the phone numbers have been separated into their own table, and Customer First Name and Customer Last Name appear as two separate attributes.

In document Data and Reality_ A Timeless Pe - William Kent.pdf (Page 109-113)