• No results found

ANATOMY OF A DATA MODEL

In document DATA MODELING FUNDAMENTALS (Page 98-140)

CHAPTER OBJECTIVES

. Provide a refresher on data modeling at different information levels

. Present a real-world case study

. Display data model diagrams for the case study

. Scrutinize and analyze the data model diagrams

. Arrive at the steps for creating the conceptual data model

. Provide an overview of logical and physical models

In Chapter 1, we covered the basics of the data modeling process. We discussed the need for data modeling and showed how a data model represents the information requirements of an organization. Chapter 1 described data models at different information levels.

Although an introductory chapter, it even discussed the steps for building a data model.

Chapter 1 has given you a comprehensive overview of fundamental data modeling concepts.

As preparation for further study, Chapter 2 introduced the various data modeling approaches. In that chapter, we discussed several data modeling techniques and tools, evaluating each and comparing one to the other. Some techniques are well suited as a com-munication tool with the domain experts and others are more slanted toward the database practitioners for use as a database construction blueprint. Of the techniques covered there, entity-relationship (E-R) modeling and Unified Modeling Language (UML) are worth special attention mainly because of their wide acceptance. In future discussions, we will adopt these two methodologies, especially the E-R technique, for describing and creating data models.

In this chapter, we will get deeper into the overall data modeling process. For this purpose, we have selected a real-world case study. You will examine the data model for

Data Modeling Fundamentals. By Paulraj Ponniah Copyright# 2007 John Wiley & Sons, Inc.

73

a real-world situation, analyze it, and derive the steps for creating the data model. We intend to make use of E-R and UML techniques for the case study. By looking at the modeling process for the case study, you will understand a practical approach on how to apply the data modeling steps in practice.

First, let us understand how to examine a data model, what components to look for, and learn about its composition. In particular, we will work on the composition of a conceptual data model. Then, we will move on to the case study and present the data model diagrams.

We will proceed to scrutinize the data model diagrams and review them, component by component. We will examine the anatomy of a data model.

This examination will lead us into the steps that will produce a data model. In Chapter 1, you had a glimpse of the steps. Here the discussion will be more intense and broad. You will learn how each set of components is designed and created. Finally, you will gain knowledge of how to combine and put all the components together in a clear and understandable data model diagram.

DATA MODEL COMPOSITION

Many times so far we have reiterated that a data model must act as a means of communication with the domain experts. For a data modeler, the data model is your vehicle for verbalizing the information requirements with the user groups. You have to walk through the various components of a data model and explain how the individual components and the data model as a whole represent the information requirements of the organization. First, you need to point out each individual component. Then you should be describing the relation-ships. After that, you show the subtle elements. Overall, you have to get the confirmation from the domain experts that the data model truly represents their information requirements.

How can you accomplish all of this? In this section, we will study the method for scru-tinizing and examining a data model. We will learn what to look for and how to describe a data model to the domain experts. We will adopt a slightly unorthodox approach. Of course, we will start with a description of the set of information requirements. We will note the various business functions and the data use for the functions. However, instead of going through the steps for creating a data model for the set of information require-ments, we will present the completed data model. Using the data model, we will try to describe it as if we are communicating with the domain experts. After that, we will try to derive the steps of how to create the data model. We will accomplish this by using a comprehensive case study. So, let us proceed with the initial procedure for reviewing the set of components of a data model.

Models at Different Levels

You will recall the four information levels in an organization. Data models are created at these four information levels. We went through the four types of data models: external data model, conceptual data model, logical data model, and physical data model. We also reasoned out the need for these four types of data models.

The four types of data models must together fulfill the purposes of data modeling. At one end of the development process for a data system is the definition and true represen-tation of the organization’s data. This represenrepresen-tation has to be readable and understandable so that the data modelers can easily communicate with the domain experts. At the other

end of the development process is the implementation of the data system. In order to do this, we need a blueprint with sufficient technical details about the data. The four types of data models address these two separate challenges. Let us quickly revisit the four types of data models.

Conceptual Data Model. A conceptual data model is the highest level of abstraction to represent the information requirements of an organization. At this highest level, the primary goal is to make the representation clear and comprehensible to the domain experts. Clarity and simplicity dictate the underlying construct of a conceptual data model. Details of data structures, software features, and hardware considerations must be totally absent in this type of data model.

Essentially, the data model provides a sufficiently high-level overview of the basic business objects about which data must be stored and available in the final data system.

The model depicts the basic characteristics of the objects and indicates the various relationships among the objects. Despite its simplicity and clarity, the data model must be complete with all the necessary information requirements represented without any exceptions. It should be a global data model for the organization. If ease of use and clarity are prime goals, the conceptual data model must be constructed with simple generic notations or symbols that could be intuitively understood by the user community.

External Data Model. At the conceptual level, the data model represents the infor-mation requirements for the whole organization. This means that the conceptual data model symbolizes the information requirements for the entire set of user groups in an organization. Consider each user group. Each user group has a specific set of information requirements. It is as if a user group looks at the total conceptual data model from an exter-nal point of view and indicates the pieces of the conceptual data model that are of interest to it. Then that part of the conceptual data model is a partial external data model specific for that user group. What about the other user groups? Each of the other groups has its own partial data model.

The external data model is the set of all the partial models of the entire set of user groups in an organization. What happens when you combine all the partial models and form an aggregate? The aggregate will then become the global conceptual model. Thus, the partial models are a high-level abstraction of the information requirements of individ-ual user groups. Similar to the conceptindivid-ual data model, the external data model is free from all complexities about data structures and software and hardware features. Each partial model serves as a means of communication with the relevant user group.

Logical Data Model. The logical data model brings data modeling closer to implemen-tation. Here the type of database system to be implemented has a bearing on the construc-tion of the data model. If you are implementing a relaconstruc-tional database system, the logical data model takes one specific form. If it is going to be a hierarchical or network database system, the form and composition of the logical data model differs. Nevertheless, still con-siderations of specific DBMS (particular database software) and hardware are kept out.

As mentioned earlier in Chapter 1, if you are implementing a relational database system, your logical model consists of two-dimensional tables called relations with columns and rows. In the relational convention, data content is perceived in the form of tables or relations. Relationships among the tables are established and indicated through logical links using foreign key columns. More details on foreign keys will follow later on.

DATA MODEL COMPOSITION 75

Physical Data Model. A physical data model is far removed from the purview of domain experts and user groups. It has little use as a means of communication with them. At this information level, the primary purpose of the data model is to serve as a con-struction blueprint, so it has to contain complex and intricate details of data structures, relationships, and constraints. The features and capabilities of the selected DBMS have enormous impact on the physical data model. The model must comply with the restrictions and the general framework of the database software and the hardware environment where the database system is being implemented.

A physical data model consists of details of how the database gets implemented in secondary storage. You will find details of file structures, file organizations, blocking within files, storage space parameters, special devices for performance improvements, and so on.

Conceptual Model: Review Procedure

In this chapter, we are going to concentrate mainly on the conceptual data model. Once we put together the conceptual data model correctly, we can arrive at the lower level models by adopting standard transformation techniques. Therefore, understanding conceptual modeling ranks higher in importance.

In Chapter 1, we introduced the components of a conceptual data model and reviewed some examples. You know the main parts of the model, and that all the parts hang together in a model diagram. In this chapter, we intend to review conceptual data model diagrams in greater detail. We will be reviewing model diagrams drawn using E-R and UML techniques.

Let us say we are presented with a conceptual data model diagram. How could we go about scrutinizing the diagram and understanding what the diagram signifies? What are the information requirements represented by the diagram? What do the components signify? Are there any constraints? If so, how are they shown in the diagram? On the whole, how will the domain experts understand the diagram and confirm that it is a true representation?

In Chapter 1, you noted the various symbols used to represent data model components.

Chapter 2 expanded the meaning of the notations as prescribed in various modeling tech-niques. At this time, let us formulate a systematic approach to reviewing a data model diagram. Let us consider an E-R data model diagram. The systematic approach would render itself to be adopted for other modeling techniques as well. We will apply the for-mulated systematic approach to the data model diagrams to be presented in the next section for the case study.

First and foremost, we need to make a list of all the various notations used in the diagram and the exact nature of the symbols. What does each notation signify? What does it represent? What is the correlation between an element in the real-world information requirements and its representation in the data model diagram? Essentially, a database contains data about the business entities or objects of an organization. What are the business entities for the organization? So, we look for the representations of business enti-ties or objects in the data model diagram. The business entienti-ties in a company are all con-nected in some way or other. Customers place orders. Clients buy at auctions. Passengers make reservations on airline flights. The business objects denoting customers and orders are related. Similarly, the business objects of passengers and flights are related. Therefore, the next logical step in the review of a data model diagram would involve the examination

of relationships among objects. Pursuing this further, we can formulate a systematic approach to the examination and description of a data model.

Let us summarize these steps:

Symbols and Meanings. Study the entire data model diagram. Note the symbols and their meanings.

Entity Types. Observe and examine all the entity types or objects displayed, one by one.

Generalization/Specialization. Notice if any superset and subsets are present. If they are shown, examine the nature and relationships between each group of subsets and their superset.

Relationships. Note all the relationship lines connecting entity types. Examine each relationship. Note the cardinalities and any constraints.

Attributes. Inspect all the attributes of each entity type. Determine their meanings.

Identifiers. Check the identifier for each entity type. Verify the validity and uniqueness of each identifier.

Constraints. Scrutinize the entire diagram for any representations of constraints. Deter-mine the implication of each constraint.

High-Level Description. Provide an overall description of the representations.

Conceptual Model: Identifying Components

Before proceeding to the comprehensive case study in the next section, let us take a simple small conceptual data model diagram. We will study the diagram and examine its com-ponents using the systematic approach formulated in the previous section. This will prepare you to tackle the larger and more comprehensive model diagrams of the case study. Figure 3-1 shows the conceptual data model diagram for a magazine distributor.

Let us examine the conceptual data model diagram using a systematic approach.

Symbols and Meanings. The model diagram represents the information requirements using the E-R modeling technique. Note the square-cornered boxes; these represent the entity types. You find six of them indicating that information relates to six business objects. Observe the lines connecting the various boxes. A line connecting two boxes indi-cates that the business objects represented by those two boxes are related; that is, the instances within one box are associated with instances within the other. The diamond or rhombus placed on a relationship line denotes the nature of the association. Also, note the indicators as a pair of parameters at either end of a relationship line. These are cardin-ality indicators for the relationship.

Notice the ovals branching out from each entity boxes. These ovals or ellipses embody the inherent characteristics or attributes for the particular entity type. These ovals contain the names of the attributes. Note that the names in certain ovals for each entity type are

DATA MODEL COMPOSITION 77

underscored. The attributes for each box with underscored names form the identifier for that entity type.

In the model diagram, you will observe two subset entity types as specializations of the supertype entity types. Although the initial version of the E-R model lacked provision for indicating supersets and subsets, later enhancements included these representations.

Entity Types. Look at the square-cornered boxes in the data model diagram closely. In each box, the name of the entity type appears. Notice how, by convention, these names are printed in singular and usually in uppercase letters. Hyphens separate the words in multi-word names. What does each entity type box represent? For example, the entity type box PUBLISHER symbolizes the complete set of publishers dealing with this magazine distri-buting company. You can imagine the box as containing a number of points each of which is an instance of the entity type—each point indicating a single publisher.

Notice the name of one entity type MAGAZINE enclosed in a double-bordered box.

This is done to mark this entity type distinctly in the diagram. MAGAZINE is a dependent entity type; its existence depends on the existence of the entity type PUBLISHER. What do we mean by this? For an instance of the entity type MAGAZINE to exist or be present in the database, a corresponding instance of the entity type PUBLISHER must already exist in the database. Entity types such as MAGAZINE are known as weak entity types;

entity types such as PUBLISHER are called strong entity types.

Generalization/Specialization. Notice the entity type boxes for INDIVIDUAL and INSTITUTION. These are special cases of the entity type SUBSCRIBER. Some subscri-bers are individuals and others are institutional subscrisubscri-bers. It appears that the data model

FIGURE 3-1 Conceptual data model: magazine distributor.

wants to distinguish between the two types of entities. Therefore, these two types of sub-scribers are removed out and shown separately. INDIVIDUAL and INSTITUTION are subtypes of the supertype SUBSCRIBER. When we consider attributes, we will note some of the reasons for separating out subtypes. Note also that an instance of the supertype is an instance of exactly one or the other of the two subtypes.

Observe how the connections are made to link the subtypes to the supertype and what kinds of symbols are used to indicate generalization and specialization. The kinds of symbols vary in the different CASE tools from various vendors.

Relationships. Note the direct relationships among the various entity types. The relationship lines indicate which pairs of entity types are directly related. For example, publishers publish magazines; therefore, the entity types PUBLISHER and MAGAZINE are connected by a relationship line. Find all the other direct relationships: MAGAZINE with EDITION, SUBSCRIBER with MAGAZINE, SUBSCRIBER with EDITION.

The model diagram shows two more relationship lines. These are between the supertype SUBSCRIBER and each of the subtypes INDIVIDUAL and INSTITUTION. Observe the special symbols on these relationship lines indicating generalization and specialization.

The names within the diamonds on the relationship lines denote the nature of the relationships. Whenever the model diagram intends to indicate the nature of the relation-ships, verbs or verb phrases are shown inside the diamonds. For example, the verb “pub-lishes” indicates the act of publishing in the relationship between PUBLISHER and MAGAZINE. However, some versions of the data model consider relationships as objects in their own right. In these versions, relationship names shown inside the diamonds are nouns. Sometimes these would be concatenations of the two entity type names, for example, something like the compound word publisher-magazine.

Let us consider the cardinality and optionality depicted in the relationships. The second parameter in the pair indicates the cardinality; that is, how many occurrences of one entity type may be associated with how many of occurrences of the other entity type. The first

Let us consider the cardinality and optionality depicted in the relationships. The second parameter in the pair indicates the cardinality; that is, how many occurrences of one entity type may be associated with how many of occurrences of the other entity type. The first

In document DATA MODELING FUNDAMENTALS (Page 98-140)

Related documents