• No results found

the Type of Model?

In document Data Modeling by Steve Hoberman (Page 140-153)

The Scorecard starts by assuming the model is perfect.The Scorecard starts by assuming the model is perfect.  As analysts, we sometimes As analysts, we sometimes notice immediately what is wrong. This can lead to quickly pointing out the negatives in notice immediately what is wrong. This can lead to quickly pointing out the negatives in designs, which in turn, can make us blind to what is good in the model, causing conflict designs, which in turn, can make us blind to what is good in the model, causing conflict or hard feelings among project team members. The Scorecard starts off with a perfect or hard feelings among project team members. The Scorecard starts off with a perfect score of 100. We then subtract points from this score for categories where we identify score of 100. We then subtract points from this score for categories where we identify areas that need improvement.

areas that need improvement.

The Scorecard is objective and externally-defined.The Scorecard is objective and externally-defined.  I have participated in model I have participated in model reviews where modelers take the review pe

reviews where modelers take the review personally and comments take the form of "Irsonally and comments take the form of "I don't like what you did

don't like what you did here…" or "You are still not getting here…" or "You are still not getting this structure right…" Wethis structure right…" We need to step back from the 'I' and 'You', and critique the model with an external and need to step back from the 'I' and 'You', and critique the model with an external and objective perspective. Team rapport remains intact as you, the reviewer, are not objective perspective. Team rapport remains intact as you, the reviewer, are not criticizing their model, but rather evaluating how

criticizing their model, but rather evaluating how well the model meets pre-definedwell the model meets pre-defined objectives, using an external scale to indicate areas for improvement.

objectives, using an external scale to indicate areas for improvement.

The Scorecard is easy to apply and standardize.The Scorecard is easy to apply and standardize.  The Scorecard was designed to enable The Scorecard was designed to enable even those new to modeling to critique their own models and the models of their

even those new to modeling to critique their own models and the models of their

colleagues. It should be incorporated into your methodology as a final checkpoint before colleagues. It should be incorporated into your methodology as a final checkpoint before the model is considered complete.

the model is considered complete.

Let's explore each of these ten categories in more detail.

Let's explore each of these ten categories in more detail.

1. How Well Do the

1. How Well Do the Characteristics of the Model Support Characteristics of the Model Support the Type of Model?

the Type of Model?

This is the "type" category. This

This is the "type" category. This question ensures that the type of model question ensures that the type of model (subject area, logical, or(subject area, logical, or  physical — and then either relational or dimensional) has the appropriate level of detail. In

 physical — and then either relational or dimensional) has the appropriate level of detail. In

general terms, the subject area model should contain a well-defined scope, the logical data model general terms, the subject area model should contain a well-defined scope, the logical data model should be application-independent and represent a business solution, and the physical data model should be application-independent and represent a business solution, and the physical data model should be tuned for performance, secu

should be tuned for performance, security, and take into consideration hardware arity, and take into consideration hardware and software.nd software.

The physical data model shou

The physical data model should represent a technical solution. A dimensional model ld represent a technical solution. A dimensional model is builtis built when there is a nee

when there is a need to play with numbers, while d to play with numbers, while a relational model is built for everything else.a relational model is built for everything else.

This category is challenging to grade because the modeler and reviewer need to understand the This category is challenging to grade because the modeler and reviewer need to understand the definition of each model type. Also, during crunch times, one model might be created to

definition of each model type. Also, during crunch times, one model might be created to represent both logical and physical views, for example, thus complicating the grading of this represent both logical and physical views, for example, thus complicating the grading of this category. I call this type of model

category. I call this type of model a "physio-logical" model.a "physio-logical" model.

Each subject area on a subject area model must be basic and critical. This means that it is core to Each subject area on a subject area model must be basic and critical. This means that it is core to the business being modeled, and that without this concept, the business being modeled would be the business being modeled, and that without this concept, the business being modeled would be very different or possibly non-existent. Customer, Product, and Employee are all examples of very different or possibly non-existent. Customer, Product, and Employee are all examples of subject areas.

subject areas.

If a logical data model is relational, it should

If a logical data model is relational, it should be fully normalized. If a logical be fully normalized. If a logical model ismodel is

dimensional, every hierarchy level should be shown and there should only be a single meter. It dimensional, every hierarchy level should be shown and there should only be a single meter. It needs to be completely independent of software and hardware.

needs to be completely independent of software and hardware.

The physical data model

The physical data model represents context. Will the logical data model work with a represents context. Will the logical data model work with a given set ofgiven set of software and hardware? If there are op

software and hardware? If there are opportunities to improve performance, space management,portunities to improve performance, space management, and security, for example, the

and security, for example, the model should be modified cautiously to allow model should be modified cautiously to allow applications to use itapplications to use it more efficiently.

more efficiently.

For example, Figure 12.1

For example, Figure 12.1 contains a relational logical data model that contains a relational logical data model that is not fully normalized —is not fully normalized — the many-to-many relationship between Employee and Department has not been resolved. Figure the many-to-many relationship between Employee and Department has not been resolved. Figure 12.2 contains the relational logical data model after this category catch has been fixed. A new 12.2 contains the relational logical data model after this category catch has been fixed. A new associative entity has been added to resolve the many-to-many.

associative entity has been added to resolve the many-to-many.

Figure 12.1: Category catch—Relational logical data model not fully normalized Figure 12.1: Category catch—Relational logical data model not fully normalized

Figure 12.2: Category catch fixed—Associative entity added Figure 12.2: Category catch fixed—Associative entity added

2. How Well Does the Model Capture

2. How Well Does the Model Capture the Requirements? the Requirements?

This is the "correctness" category. That is, we

This is the "correctness" category. That is, we need to understand the coneed to understand the content of what is beingntent of what is being modeled. This can be the most difficult of all 10 categories to grade because we really need to modeled. This can be the most difficult of all 10 categories to grade because we really need to understand how the business works and what they want from their application(s). If we are understand how the business works and what they want from their application(s). If we are modeling a sales data mart, for example, we need to understand both how the invoicing process modeling a sales data mart, for example, we need to understand both how the invoicing process works in our company, as

works in our company, as well as what reports and queries will be neewell as what reports and queries will be needed to answer key salesded to answer key sales questions from the business.

questions from the business.

We need to ensure that our model represents the data requirements, as the costs can be We need to ensure that our model represents the data requirements, as the costs can be devastating if there is even a slight difference be

devastating if there is even a slight difference between what was required and whatween what was required and what wast was delivered. Besides not delivering what was expected, there is the potential for the IT/business delivered. Besides not delivering what was expected, there is the potential for the IT/business relationship to suffer. The model must support business expectations.

relationship to suffer. The model must support business expectations.

For example, Figure 12.3 contains a dimensional physical data model for a reporting application.

For example, Figure 12.3 contains a dimensional physical data model for a reporting application.

One of the questions defined in

One of the questions defined in the requirements is "Show me Net Sales by the requirements is "Show me Net Sales by Customer and byCustomer and by Quarter." The model does contain

Quarter." The model does contain Net Sales AmountNet Sales Amount, but the calendar view does not contain, but the calendar view does not contain Quarter. Figure 12.4 shows the physical data model after this category catch has been fixed by Quarter. Figure 12.4 shows the physical data model after this category catch has been fixed by replacing Month with Quarter. Note that when

replacing Month with Quarter. Note that when confirming this requirement, we might learn thatconfirming this requirement, we might learn that  both Month and Quarter are required, instead of one or the other. We might also learn that  both Month and Quarter are required, instead of one or the other. We might also learn that Quarter was not added deliberately, because it could be easily derived by aggregating three Quarter was not added deliberately, because it could be easily derived by aggregating three Months of data.

Months of data.

Figure 12.3: Category catch—Physical data model that does not answer the required business Figure 12.3: Category catch—Physical data model that does not answer the required business question

question

Figure 12.4: Category catch fixed—Month replaced by Quarter Figure 12.4: Category catch fixed—Month replaced by Quarter

3. How Complete is the

3. How Complete is the Model? Model?

This is the "completeness" category. This category checks for data model components that are This is the "completeness" category. This category checks for data model components that are not in the requirements or requirements that are not represented on the model. If the scope of the not in the requirements or requirements that are not represented on the model. If the scope of the model is greater than the requirements, we have a situation known as "scope creep." There is model is greater than the requirements, we have a situation known as "scope creep." There is more on the model than is specified in the requirements. Therefore, we are planning on

more on the model than is specified in the requirements. Therefore, we are planning on delivering more than what was originally required. This may

delivering more than what was originally required. This may not necessarily be a bad not necessarily be a bad thing, asthing, as long as this additional scope has been factored into the project plan. If the model scope is less long as this additional scope has been factored into the project plan. If the model scope is less than the requirements, we will be leaving information out of the resulting application, usually than the requirements, we will be leaving information out of the resulting application, usually leading to an enhancement or "Phase II" shortly after the application is in production. For leading to an enhancement or "Phase II" shortly after the application is in production. For completeness, we need to make sure the scope of the project and model match.

completeness, we need to make sure the scope of the project and model match.

Also for completeness, we need to

Also for completeness, we need to ensure that all the necessary data modeensure that all the necessary data model descriptivel descriptive information (also known as metadata, which will be discussed in

information (also known as metadata, which will be discussed in Chapter 16Chapter 16) is populated. So for) is populated. So for this category, we need to make sure both data and metadata scope are appropriate. Regarding this category, we need to make sure both data and metadata scope are appropriate. Regarding metadata, there are certain types that tend to be overlooked when modeling, such as definitions metadata, there are certain types that tend to be overlooked when modeling, such as definitions and alternate keys. These typ

and alternate keys. These types of metadata, along with those that aes of metadata, along with those that are mandatory parts of ourre mandatory parts of our model such as data element name and format information, need to be checked for completeness model such as data element name and format information, need to be checked for completeness in this category.

in this category.

For example, in Figure 12.5, the Promotion entity contains a surrogate key but not a For example, in Figure 12.5, the Promotion entity contains a surrogate key but not a

corresponding alternate key. The alternate key in this case will be the natural business key. That corresponding alternate key. The alternate key in this case will be the natural business key. That is, what data element or data elements would make a promotion unique in the eyes of the

is, what data element or data elements would make a promotion unique in the eyes of the  business? Figure 12.6 contains this data model after this category catch has been fixed by  business? Figure 12.6 contains this data model after this category catch has been fixed by

confirming with the business that the natural key is both

confirming with the business that the natural key is both Promotion CodePromotion Code and and Promotion StartPromotion Start DateDate..

Figure

Figure 12.5:12.5: CategoryCategory catch—Surrogatecatch—Surrogate keykey missingmissing alternatealternate keykey

Figure

Figure 12.6:12.6: CategoryCategory catchcatch fixed—Alternatefixed—Alternate keykey addedadded

4. How Structurally Sound is the

4. How Structurally Sound is the Model? Model?

This is the "structure" category. This category validates the design practices employed to build This is the "structure" category. This category validates the design practices employed to build the model to ensure we can eventually build a database from our data model, avoiding design the model to ensure we can eventually build a database from our data model, avoiding design issues such as having two data e

issues such as having two data elements with the same exact name in lements with the same exact name in the same entity, a null datathe same entity, a null data element in a primary key, and partial key relationships

element in a primary key, and partial key relationships[[11]]. Traditionally, when we review a data. Traditionally, when we review a data model, the violations we catch fall into this category, because we don't need to understand the model, the violations we catch fall into this category, because we don't need to understand the content of the model to

content of the model to score this category. If someone knows nothing ascore this category. If someone knows nothing about the industry orbout the industry or subject matter the model represents, this category can still be graded accurately.

subject matter the model represents, this category can still be graded accurately.

Many of the potential problems from this category a

Many of the potential problems from this category are quickly and automatically flagged re quickly and automatically flagged by ourby our modeling and database tools. Structural soundness issues that escape the human eye, or are very modeling and database tools. Structural soundness issues that escape the human eye, or are very tedious to check for can be identified easily by many data modeling tools. For example, checking tedious to check for can be identified easily by many data modeling tools. For example, checking the primary keys in each entity to ensure they are mandatory and not optional.

the primary keys in each entity to ensure they are mandatory and not optional.

In the logical data model in Figure 12.7, the two relationships between Customer and Account In the logical data model in Figure 12.7, the two relationships between Customer and Account are both mandatory. The first relationship captures that

are both mandatory. The first relationship captures that each Customer may own many each Customer may own many Accounts,Accounts, and each Account must be owned by a single Customer. This means that an Account cannot be and each Account must be owned by a single Customer. This means that an Account cannot be created without a Customer. The second relationship however captures that each Account may be created without a Customer. The second relationship however captures that each Account may be owned by many Customers, and each Customer must own one Account. This means that

owned by many Customers, and each Customer must own one Account. This means that Customer cannot be created without an Account. We have a circular relationship, meaning we Customer cannot be created without an Account. We have a circular relationship, meaning we cannot create any instances in either entity because the other entity instance must exist first. We cannot create any instances in either entity because the other entity instance must exist first. We cannot create Bob the Customer, for example, without creating his Accounts. However, we cannot create Bob the Customer, for example, without creating his Accounts. However, we

Figure

Figure 12.7:12.7: CategoryCategory catch—Logicalcatch—Logical datadata modelmodel withwith circularcircular relationshiprelationship

Once we realize that this circular relationship is really a many-to-many relationship between Once we realize that this circular relationship is really a many-to-many relationship between Customer and Account, we can fix it by adding an associative entity between the two, such as

Once we realize that this circular relationship is really a many-to-many relationship between Once we realize that this circular relationship is really a many-to-many relationship between Customer and Account, we can fix it by adding an associative entity between the two, such as

In document Data Modeling by Steve Hoberman (Page 140-153)