• No results found

LEVELS OF DESCRIPTION

Data Description

LEVELS OF DESCRIPTION

The various people and applications using a database are likely to have different perceptions of the entities and information they are dealing with (employees vs. stockholders; employer implied by record type vs. employer as a field value). Different applications use different facts about entities, so that an employee record may look quite different in the personnel application and in the medical benefits application. It is also possible for these applications to use different data processing disciplines, i.e., different file types, access methods, and data structures. These generally provide different ways of representing relationships and different interfaces for manipulating the data.

Thus there is a level of description corresponding to the perceptions and expectations of various applications, specifying such things as record formats, data structures, and access methods. For some kinds of question answering systems, or systems with graphical displays, the descriptions might not even be couched in terms of record formats.

All these applications may be supported by a common pool of data, an integrated database. One significance of integration is that common attributes are synchronized; e.g., changing an employee's address also changes his address in the stockholder file, if he happens to be one. Synchronization may be achieved by maintaining the address in only one place, or by the system's recognizing that a change in one place must automatically be propagated to another place. The method

doesn't matter, as long as the information appears synchronized to users.

Another significance of integration is that a new application may “borrow” data already in the database for the benefit of other applications. The new application's requirements can be mapped directly to the integrated database. Without integration, it can be difficult and often impossible to extract the data from several physically unrelated files and then merge it into a form useful to the new application.

The integrated database is the system's analog to the real world: it is that ongoing persistent thing of which different applications may have different perceptions.

Although the integrated database is the system's analog to the real world, most attempts at modeling the entire organization fall short because they fail to acknowledge the many different perceptions. Not only do we need to build an enterprise data model showing how an organization should ideally be perceived, we also need to “map” each of the concepts on this holistic model to each of the different perspectives. For example, the enterprise data model might view Person as distinct from the roles that a person can play such as Employee and Consumer, whereas the Employee Payroll System only has knowledge of Employee and the Order Entry System only has knowledge of Consumer. To bridge the reality of different perceptions to the aspirational enterprise data model requires a mapping. This mapping is essential to any enterprise modeling effort. One can read an enterprise model and say “It's all very nice that a Person can play many roles, but where is my Consumer today?”

Unlike the real world, however, we don't have the luxury of merely saying “it's there—make of it what you will, with your own eyes and ears and mind.” The database has to be described to the system.

We have a choice of describing the integrated database in “physical” terms, or in both “physical” and “logical” terms. Physical descriptions specify the location, format, and organization of the data on disks, tapes, or other storage media; the locations of key fields in records; the kinds of pointers used to reference related records; the criteria for physical contiguity of records, and the handling of “overflow” records; the kinds of indexes provided, and their locations; etc. Logical descriptions are more in terms of the information content of the database: the kinds of entities, the attributes, and the relationships among them.

There are conceptual, logical, and physical data models. Conceptual data models are the highest level and often do not exceed a single piece of paper in terms of their size. Conceptual data models are used

Product, along with their definitions. Logical data models are very detailed models that contain all of the attributes and relationships needed for the solution to the business problem. The physical data model compromises the logical data model mainly to make it work better with software and hardware tools, such as a particular database or reporting tool. So, the conceptual data model captures the business scope of the problem, the logical the business solution, and the physical the technical solution.

Recall this model from earlier in the text. It is a conceptual data model:

Adding all of the attributes, entity types, and relationships to this model produces the logical data model on the next page:

Considering technology may lead us to the following physical data model: Butler Murderer

For a detailed description on how to read these models and the tradeoffs incurred from logical to physical, please refer to the book Data Modeling Made Simple, 2nd Edition.

There is growing recognition of a need to provide and maintain these three levels of description ([ANSI], [GUIDE-SHARE]).

This separation into multiple levels of descriptions is necessary to cope with change. Experience has shown that the way data is used changes with time. Application programs change the way they use the data. They change record formats, and they change the combinations of records they need to see in a single process. New applications need to see records containing data that had previously been split among several records. Other new applications need extensions to existing data (e.g., additional fields in old records), without disturbing the old applications. Applications sometimes change the data management technique which they use to access the data. As an increasing number of applications interact with an increasingly large integrated database, the effects of such changes become much more complex, more difficult to predict and control.

A need is emerging to manage the data in a manner that is insensitive to such changes. A new role is emerging—the database administrator. A large part of his job consists of defining and managing this mass of information as a corporate resource ([ANSI] splits out this part of the job into the role of “enterprise administrator”). He needs a way to describe this information purely in terms of “what kinds of information do we maintain in the system.” With this description (the logical model) as a reference, he can then separately specify the various formats in which this data is to be made available to application processes (the external models), and also the physical organizations in which the data is to exist in the machine (the internal model).

The 1978 role of database administrator has split into four roles over the last four decades: data modeler, data architect, database administrator, and database developer. The data modeler is responsible for translating the project business requirements into a data model. The data architect is responsible for ensuring consistency across data models—as Kent says in the paragraph above, for “defining and managing this mass of information as a corporate resource.” The database administrator is responsible for maintaining existing databases so that they remain stable and perform well. The database developer is responsible for building new databases. In the 1980s and into the 1990s, there was also the role of a data administrator, responsible for maintaining the data models and other related metadata, such as data definitions. This role has largely been absorbed

by both the data modeler and data architect roles.

Besides its role in an operational database system, a logical model is also needed in the planning process. It provides the basic vocabulary, or notation, with which to collect the information requirements of various parts of the enterprise. It provides the constructs for examining the interdependencies and redundancies in the requirements, and for planning the information content of the database.

This book is essentially concerned with the logical model, i.e., the descriptions of the information content of the database. It reflects a perception of reality held by one person or group, in the role of database administrator. This administrator decides what portion of the real world is to be reflected in the database, and which constructs, conventions, models, assumptions, etc., are to be used. Although it is a single perception of reality, it must be broad and universal enough to be transformable into the perceptions of all the applications supported by the database.