• No results found

Categories / Union Records

2.1 – ERM and EERM

UNDER TAKE

2.2.2. Categories / Union Records

A useful addition to the ERM is the category construct (as introduced by Elmasri et al

[ELMAS85] and also found in NIAM models as an exclusion constraint [VERHE82]). Let us use

our EMPLOYEE entity with TECHNICAL, ADMIN and MANAGER sub-classes. If we want to represent a relationship between an employee and a piece of work then we can relate EMPLOYEE and an entity called WORK. If it transpires that only instances from TECH and

ADMIN should relate to WORK instances then we have to reconsider the diagram. We could leave the relationship between EMPLOYEE and WORK and write a note that prohibits the participation of MANAGER instances. Or we can create a new entity, if multiple inheritance is allowed, that sub-classes TECH and ADMIN, say called TECH_ADMIN_EMP, and relate this to WORK. Unfortunately this is not semantically correct as the instances of TECH_ADMIN_EMP include all attributes of both TECH and ADMIN. Also if the two entities are not really relate-able through a sub-class relationship, then creating an entity that inherits from both is generally not in good style.

A better solution is to use the category artefact whereby a “new” entity denoted in the diagram represents the union of the instances of a number of entities and this union is a pool of instances that can participate in a relationship. The meaning of this construct is slightly different from the previous relationships as entities participating in the union would generally have different primary key set constraints; one solution is to include details of both entities and instances in the relationship instance details. The notation to represent the category concept uses the circle with a letter “U” inside it – see figure 2.6. Also the category construct can be qualified as total or partial. We are not too comfortable about introducing a “new” rectangle (as in [ELMAS85]) and really prefer the NIAM exclusive

notation (i.e. connect the diamond to the circle which is connected to each entity participating in the union and connect the emanating relationships with an arc). There are two reasons for this: firstly introducing an entity which is really a “view” of other entities and secondly some flexibility is lost as there can be only one cardinality ratio for all entities participating in the union.

Conceptual Modelling - Page [ 20] EMPLOYEE D MANAGER TECH ADMIN WORK ON U N 1 M

Fig. 2.6 – An EERM with a category (or EOR) artefact – i.e. arc.

2.2.3 Aggregation

Aggregation builds an instance out of a number of other instances through a sophisticated

relationship between the “whole” and its “parts” instances. This relationship is usually transitive but not necessarily. Aggregation, sometimes colloquially called the part-of relationship, is at a higher level than entity composition. Although this rich construct has been known for a long time (through Smith and Smith’s work in [SMITH77b]) few diagrams

actually use it. Later and significant literature includes Steel’s work on Lisp [STEEL84], the

MCC Orion initiative [KIMWO89G], UML [BOOCH00], and Barbier’s [BARBI03]. A telling

rendition of aggregation through a ‘bill of material’ diagram for a bicycle is shown in figure 2.7. While the front wheel brake unit is a part of the whole bicycle, the same unit is in itself a whole, at least in terms of spare part sales, as it is made up of other sub parts.

Fig. 2.7 – Exploded drawing of a bicycle (Source: Raleigh Grifter DL60 Exploded Drawing from 1977 Raleigh Dealer Manual).

Conceptual Modelling - Page [ 21]

We need to have an aggregated relationship which captures: 1) sharing and independence of “parts” by the “whole”; 2) total or partial participation of “part” instances; 3) attributes to each manifestation of the “part” in a transitive relationship; 4) attributes of the whole transitive closure. In an aggregated relationship sharing and independence are orthogonal and consequently we have four possibilities – see figure 2.8.

or or

Exclusive

Sharable

In

d

ep

en

d

en

t

D

ep

en

d

en

t

E1 E11 R1 A 1 N E1 E11 R1 A N M E1 E11 R1 A N M E1 E11 R1 A 1 N E1 E11 R1 A 1 N E1 E11 R1 A N M

Fig. 2.8 – Aggregation in EERM (no transitivity is shown)

As for notation there are two main models, one found in Elmasri [ELMAS10] and the other in

Teorey [TEARE05] which is the one adopted here and it uses a special circle with an “A”. In

figure 2.8 we give the four combinations of an aggregate relationship through the latter notation and using sharing and independence to qualify relationship. It is important to note that sharing is connected by a many-to-many relationship. What do the four patterns

Conceptual Modelling - Page [ 22]

yield in terms of the aggregation relationship? The following table gives actual examples; see table 2.1.

Independence Sharing Example

Independent Sharable Cyclocomputer applet that tracks effort in terms of distance and climb.

Independent Exclusive A bicycle tyre. Dependent Sharable Frame spray paint.

Dependent Exclusive Lock nuts (following advice that lock nuts are to be used only once!)

Table 2.1 – Four aggregation patterns through independence and sharing.

An example diagram that can meet some of the data modelling requirements of an aggregation in the manufacturing of books is in figure 2.9. In this case there is no part-of transitive relationship. Another example relationship that describes a typical bill of material composition is given in figure 2.10; it has a transitive part-of relationships.

BOOK

COVER

CHAP.

CDROM

En Caps

A

1

1

Has

N

N

Inclds

N

1

Total Cost Cost Total Pages

Cost Pages Cost

TPgs

H.bound

Conceptual Modelling - Page [ 23]

PART

A

N

Assoc.

M

Qty

ITEM

is a

N

1

Assem.

A

1

N

Join Cost Basic Cost SN# Cost PN# Physical Logical

Fig. 2.10 – A bill of material EERM with transitive relationships. Note that entity item represents tangible parts, while parts entity is a catalogue description of items. For an aggregation relationship to become distinctive from a normal relationship one has to consider two aspects. The first has to do with instances of a relationship; specifically each instance of a normal relationship is independent from any other instance but in the case of an aggregation relationship many instances are related through transitive closure. The second aspect is the asymmetric role of an aggregation relationship that contrasts with the symmetrical nature of a normal relationship. Consequently before introducing an aggregated relationship the analyst must carefully consider these two aspects. In practice these distinctions might not offer conclusive arguments in favour of inclusion of an aggregation relationship in a diagram.

There are a number of implicit constraints associated with an aggregation relationship representation in the EERM diagrams. Some examples follow (refer to figure 2.11).

a) If a “whole” instance is deleted and that instance is in a transitive aggregation relationship that has an encoded non-shareable and dependent part-of relation with other instances then all these “parts” are purged too. Consequently a portion of the graph, the sub-graph rooted at the deleted entity instance of the graph, is deleted. In figure 2.11 deleting instance ‘c’ reduces graph A) to C).

Conceptual Modelling - Page [ 24]

b) Assume we have an aggregation relationship between the “whole” and its parts in a transitive aggregation relationship that has an encoded independent part-of relation; then deleting an instance would generate a number of transitive closures. That is, the original graph can break into “smaller graphs”. In the example deleting instance ‘c’ reduces graph A) to D). f g h e c d b a f g h e c d b a b a f d g h e b a A) B) C) D)

Fig. 2.11 – A number of instances (named ‘a’ to ‘h’) are related with a part of relationship – shown as edges A). The instances transitive closure is represented in B). Graphs C) and D) are the result of deleting instance ‘c’ with different part-of constraints.

Some “convoluted” constraint requirements are usually identified during design; for example the costs of “parts” should not exceed 30; the depth of transitive closure tree should be between 3 and 5, the total distance of a travel plan should not exceed 200 units. Most of these have to be included as notes to the EERM; notice the use of an entity’s computed attributes in figure 2.9 and 2.10 are assigned values computed by tallying the sub-graph.

Some notes of caution have to be mentioned here. Firstly, it is possible to create legal diagrams that do not make intuitive sense. Secondly, some combinations of the above features can be represented as a simpler ERM (e.g. a N(p)-1(t) between a weak entity and its related entity can be represented much more simply as a basic aggregation relationship). As for the former we assume a pre-processing of an EERM can detect these

Conceptual Modelling - Page [ 25]

“aberrant” relationships, while for the latter we also assume that the simpler solution is drawn.

Nonetheless the biggest problem is that these advanced constructs require more than malleable structural rules; in fact good transitional data description constructs are required.

2.3

How and How not to Use ERM

ERM is a high-level language and we have seen how its diagrams capture a reasonable level of structures, relationships and constraints. From a software engineering perspective there is yet another important aspect for using ERMs for database analysis and design: how to go about creating the diagrams – i.e. a methodology. If the use of ERM is part of a larger methodology (e.g. SSADM) then it is best to follow the guidelines of the methodology and representation. If, on the other hand, we are interested in building the database and depicting its role in the overall functionality, then a methodology as presented in Batini’s et

al textbook [BATIN92] is a well-known technique.

In general an ERM is used in top-down design methodologies where these are given an informal description of the data requirements and the diagram is built by refining and building on successive versions of the diagram.

The general advantages of using ERM’s proposals in database design are the following. 1) ERMs are high-level languages that are relatively easy to read by a wide spectrum of

users (analysts, designers and selected end-users).

2) ERMs are independent from any particular data model and especially any database physical constructs (e.g. indices, or data placement policies).

3) ERMs describe a high proportion of the data structures, relationships and static constraints that an information system requires. These models are in fact used in describing the conceptual and external schemas of the ANSI/SPARC three-level data architecture. Also some papers use ERMs as a basis for building co-operative information systems (e.g. a loosely couple multi-database framework); a a seminal paper is [HEIMB85].

Conceptual Modelling - Page [ 26]

4) ERMs are also used as a test bed during the analysis and early design phases. The diagrams are useful to verify and validate an information system’s high-level requirements.

5) ERM’s constructs are not fudged or dictated by the target database model; for example, with a relational data model an M-N relationship is broken into two 1-N relationship and a resolving entity.

The main disadvantages of using an ERM approach in database design are the following four.

1) A properly drawn ERM does not guarantee it is devoid of data redundancy. In fact it is highly appropriate to re-design critical areas of the diagram through bottom-up data analysis (especially if the target data model is the relational then at least Boyce Codd Normal Form should be established). Also some patterns in a diagram require the analysts to re-check these (more specific examples, such as connection traps, are given in the following sub-section).

2) For information systems that have a “large” design the size of the diagram and interconnections make it hard to read and maintain (e.g. re-draw). Some researchers (a case in point being Elmasri [ELMAS10]) do attempt to “package” the diagram

through a partitioning into simpler diagrams. Also some highly data-driven and modular systems (with an end-less list of start-up and operational parameters – e.g. SAP accounting package { WWW.SAP.COM }) makes the diagram a “complex” graph.

Consequently EERMs do not scale with size of design.

3) There is an issue of drawing standardisation. This is made worst by having Graphical User Interface design tools that are too tightly coupled to a specific DBMS. 4) A query model, based on ERM, is not available per se. Nonetheless, subsets of ERM diagrams are used to describe the structure of an external schema – a view. An interesting development in UML from OMG is Object Constraint Language (OCL)

[OMGRP06] and Query/View/Transformation (QVT) [OMGRP11]. 2.3.1. ERMs and CASE tools

A number of tools are available to the designer who wants to adopt ERM modelling. These provide drawing, and conversion to a database model – mostly relational and a few object

Conceptual Modelling - Page [ 27]

relational. Popular examples include CA Erwin { WWW.ERWIN.COM }, Toad Data Modeller {WWW.QUEST.COM/TOAD-DATA-MODELER }, and DB Designer (which is open source and recently

renamed) { WWW.FABFORCE.NET/DBDESIGNER4 } – see figure 2.12. The better ones allow for

customisation but mostly ‘improve’ on Chen’s original diagram artefacts. A useful feature is the ability of a tool to read a database data dictionary and reverse engineer an ERM.

Fig. 2.12 – Screen dump from one of the conceptual design CASE tools.

2.4

Problems with ERM/EERM diagram

A perennial problem with ERM is whether a fact in the information system is presented as an entity or as an attribute in another entity. A case in point is an address abstraction. The design heuristic taught for resolving this issue is checking in the requirements whether an address ever needs to be decomposed into further parts. If it is then convert the attribute into related entity. Another aspect to consider when making this decision is the data quality level requirement; for example does the address postcode need to spatially coincide with that of the Post Office. For example, if our information system does not

Conceptual Modelling - Page [ 28]

match on addresses but feeds into another system which does that (e.g. a data warehouse

[INMON02]) then the address is better represented as an entity.

Another nagging problem is the admission of “null” values. In databases a null occurs mainly in two situations. The first it comes about when an instance does not have an applicable value for an attribute (e.g. a company does not have a birthday). The second comes about from the lack of knowledge about an entity’s attribute (e.g. a person has a date of birth which is not known in the system). The occurrence fudges the meaning of instances, as the presence of a null in an attribute’s value is ambiguous. It is best to “design out” the possibility of null assignment in parts of the ERM where computations and comparisons are intense.

There is another level of problems with ER diagrams. These are patterns that, based on arrangements of entities and relationships, although structurally correct, do not completely convey the meaning of the domain of discourse – these patterns fall under the generic term of connection traps. Also some constraints are either too specific or too general; and consequently the constraints of the application are not exactly right.