• No results found

"Philosophers will always wonder what an entity is. Meanwhile there is some information processing to do! " (Brodie et aI., J 984 p.3)

Introduction

Although the use of object-oriented techniques is increasing, some version of the original E-R Model (Chen, 1976), using some variation of the methodology proposed by Teorey et al. ( 1 986), is still a widely used conceptual data modelling technique (Avison & Fitzgerald, 1995; Bock & Ryan, 1 993; Siau et al., 1996; Flynn, 1 998). It is a central part of many system development methodologies, ranging from SSADM4 (Eva, 1 994) to Information Engineering (Finkelstein, 1 989) and is widely supported by many CASE -tools. There is evidence that in New Zealand the percentage of organisations employing formal data analysis techniques for information systems development, while low, is growing and that many of these organisations are using CASE tools which support some variation of E-R modelling (MacDonell, 1 994). It also seems likely that the E-R Model "is here to stay" (Loosely & Gane, 1990 p.7), at least for the immediate future.

Even the development of object-oriented modelling techniques does not always move us very far from the E-R tradition. Yourdon's Object-Oriented models, for example, bear a marked similarity to the conventions of E-R diagrams (Coad & Yourdon, 1 99 1 ) and Benyon-Davies ( 1 992a) suggests that the inherent strengths of E-R modelling can be extended in such a way as to provide a useful and effective object model. Blaha et al. ( 1 988) go even further claiming that "[o]f the many approaches to relational database design, the Object Modeling Technique is particularly effective"(p.414). A claim which is perhaps partly explained by their insistence that, in OMT, "{t]he notion of an object is synonymous with entity in the ER and LRDM methods" (ibid.p.41 6) and is further illuminated by their observation that "{tJhird normal form is an intrinsic benefit of object modelling" (ibid. p.41 8).

Criticisms of the E-R Model

Many of the earlier claims, that the E-R Model provides an easily conceptualised model (Yao et al., 1 984) using constructs that are highly intuitive (Brodie et aI., 1 984), and useable as an effective communication tool (Teorey et al., 1 986), are now being questioned (Goldstein & Storey, 1 990; Hitchman, 1995). Yet despite the diverse criticisms, it remains a common choice for both high level corporate modelling as well as database design. Perhaps the answer lies in the fact that "higher levels of abstraction such as E-R diagrams are conducive to creative thinking and effective communication" (Blaha et al., 1 988 p.415) and that the production of E-R models is seen as an essential component of the design strategy.' Nevertheless it is criticised for being difficult to use,

, ,teach and understand. The reasons for this apparent contradiction may lie in three areas,'

• that the E-R Model is no longer used in the way and for the purpose for which it was first proposed,

• that the E-R Model referred to by practitioners and some academics is not the E-R Model at all, and

• the necessarily descriptive method employed to construct an E-R model.

The first of these areas has been investigated in previous chapters and will not be discussed in detail here. However, it is worth re-iterating that E-R models were initially

.--

used to represent the conceptual schema and were thus essentially datalogical in nature.

Consequently, they were generally used by those professionals who had already acquired a competent understanding of the physical structure of databases. As a diagrammatic tool for representing this complex abstraction (the database) they were, no doubt, 'intuitive' to many of their early users. However, as an increasing number of non­ technical users have been required to validate their semantic content, E-R models have not been found to be necessarily intuitive to those people who are used to thinking of their data in some other form.

Kent (1978) h&S suggested that much of the difficulty that a user experiences in learning to interpret E-R models has less to do with the complexity of the tool than with the struggle, "to contrive some way of fitting his problem to the tool: changing the way he thinks about his information, experimenting with different ways of representing it". He continues, "much of this "learning" process is really a conditioning of his perceptions so that he learns to accept as fact those assumptions needed to make the theory work, and

to ignore or reject as trivial those cases where the theory fails" (Kent, 1 978, p. 1 94). This is supported by Goldstein and Storey (1 990) who suggest that users need to have a good understanding of the E-R Model before data models become useful to them. Additionally, Kim and March ( 1 995) believe that the users in their experiment may have scored more highly in discrepancy checking tasks with the E-R model than had been anticipated because they had a "higher-than-expected degree of record-orientationl .. (p. 1 l 0). This finding, while supporting Goldstein and Storey' s ( 1 990) proposal, goes further in suggesting that the greater the users' understanding of physical database structures, the more able they are to interpret and validate the E-R model correctly. Kim and March (1 995) conclude that as users become increasingly familiar with E-R models and presumably as their 'record-orientation' increases, the quality of the conceptual data models will improve. It seems likely that this problem has become more noticeable in recent years, due in part to the large increase in the number of people involved. However, it may also be due to a phenomenon that has crept insidiously into the practice of data modelling but gone largely unremarked by either practitioners or academic researchers; the development of the E-RlRelational hybrid mentioned previously2.

The E-RlRelational Hybrid

When the E-R model was originally proposed, it was intended to express data requirements in a way that was independent of the underlying implementations. Initially then, it made little sense to incorporate any aspects of the classic Models into an E-R model ; for example, it can be argued that normalising an E-R model is not necessarily useful unless the target DBMS is relational. Some writers such as A vison and Fitzgerald ( 1 995) who, despite relating nonnalisation to the Relational Model, still consider it to be applicable in other data structuring contexts, challenge this view. However, it is interesting that neither Chen in 1 976 nor Teorey et al. in 1 986 viewed nonnalisation as a part of constructing an E-R Model . Indeed, in the LDMRD (Logical Design Methodology for Relational Databases), Teorey et al. (1 986) specifically describe nonnalisation as a step that follows the mapping of the E-R model to the candidate relations of the Relational

I This tenn appears to signify familiarity with physical data storage, possibly relational, structures 2 See Page 3 1

model. However, as relational DBMSs increasingly became the only target, so a number of relational considerations found their way into the E-R conventions. The result is that many so-called E-R models can be better described as relational models using E-R graphical notation and certainly a number of CASE tools use a subset of E-R conventions3 to provide diagramming facilities for the construction of a normalised relational schema (Ryder, 1 993). Some practitioners even go so far as to dismiss the need for a conceptual model at all. Huff ( 1 992), for example, suggests that,

"It is both unrealistic and unnecessary to design data structures independent of the type of the target DBMS (unrealistic because of problems of physical implementation, unnecessary because DBMSs in companies usually live longer than conceptual data structures)"(p.33).

There is evidence in the literature that would seem to confirm the existence of this

unacknowledged hybrid. Marche ( 1 993) refers to data models, especially normalised entity relationship ones" (p.44). Cerpa (1 995) too, considers that "a conceptual model is an entity-relationship model4 in 3NFs" (p. 353), while Batra and Zanakis ( 1 994) specifically state that an "E-R diagram developed with the relational representation as the target must obey the normalisation concepts" (p.228). Benyon (1997) goes further and declares that "a completed E-R model consists of a diagram (the E-R diagram), a corresponding set of fully normalised (i.e. in 5NF) relations and additional information"

(p. l 60).

However, if an E-R model has to conform to normalisation rules, it cannot claim to be a user's view of the world "without any consideration of how data is stored on computerised files" (Jarvenpaa & Machesky, 1 989 p.367). Normalisation is a property of relations (Cerpa, 1 995) and is, therefore, a relational process that relies on the existence of well defined, minimal primary keys, and the transformation of information bearing relationships into entities, including the resolution of many to many relationships. These latter considerations are appropriate to the design of relational databases but not necessarily to those of other implementation paradigms and, with the exception of primary keys, were not a part of the original E-R Model (Chen, 1 976).

�he relationship diamond is almost never used.

4 This is referenced to Chen1976.

An E-R model that is fully normalised is thus not really an E-R model, as envisaged by Chen, but a relational model represented by similar graphics. This distinction may go some way to explaining some of the contradictory research. After all a relational model, masquerading as an E-R model, may well be more difficult for users to understand, providing, as it does, a much more constrained view of their data than a 'purer' E-R model would require. Like Hitchman ( 1 995), Kim and March (1995) make the interesting comment that a possible explanation for the lack of understanding of some of the semantic concepts that they observed, may be attributable to the lack of such constructs in the Relational Model. They also go on to say that the findings do not necessarily show that the respondents could not produce relational data structures (Kim & March, 1 995). As Simsion ( 1 994) remarks, "from a practical perspective, there is not much value in adopting a data modelling language that is not compatible with current database management systems or CASE products"(p.22). Indeed, this would seem to reflect the situation that Hitchman ( 1 995) found among practitioners in the UK, that "a significant number. . . do not seem to understand some available semantic constructs of the entity­ relationship model" (p.39).

The existence of the hybrid is hinted at by Batra et al ( 1 990) who observe that "in the general database design literature and in the practice of professional data analysts6, the Relational Model has been used to conceptualise data requirements" (p. 1 26) and also that some form of E-R graphical notation is commonly used to depict these relational models. There is little direct evidence of how widely the hybrid is used. However, personal observation and the findings of some researchers suggest that practitioners are using the E-R conventions and terming the products of this usage E-R models, although they are actually constructing implementation-oriented, relational designs rather than communication-oriented, conceptual models (Hitchman, 1 995; Kim & March, 1 995).·

Simsion and Shanks ( 1 993) for example discovered that among 39 practising data modellers in Australia not one used the Chen diamond notation for relationships. This may help to explain Kim and March' s (1995) conclusion that "users as well as IS analysts should be trained in an appropriate conceptual modeling formalism" (p. l l l ), as well as explaining why the E-R Model would appear to be no longer seen as a particularly useful

communication tool (Hitchman, 1 995). A clear expression of this confusion among practitioners is highlighted by Chee-Pun Wong ( 1 995), in the practitioner magazine 'Database Programming and Design', where it is stated (incorrectly) that "E-R modeling ... does not support attributes within relationships" (p.42). The statement however, would be quite legitimate if referring to relational modelling, which is of course the actual formalism under discussion.

The merging of E-R graphical notation with the construction of logical, relational schemas is common in textbooks and training manuals, many of which are adopting a pragmatic attitude towards the hybrid, perhaps giving an indication of the extent of the current practice. Simsion ( 1 994) in a chapter entitled 'The Entity Relationship Approach' describes how to represent a previously normalised relational design as an E-R diagram; while in a chapter entitled 'The Conceptual Schema' Avison ( 1 992) specifically details the normalisation of the entities in an E-R model. Benyon ( 1 997) is quite specific:

"there is a very close correspondence between the E-R model and the relational model. Indeed, in the methodology provided in this text, together they form the information model...The E-R model being the main user­ oriented part of the information model and the relational representation being the computer oriented part" (p. 1 80).

De Carteret and Vidgen, on the other hand, make no reference to the E-R model and are quite clear that they are writing about relational modelling. For them "another way of representing an entity is as a relational table" and they are careful to refer to diagrammatic representations as "Entity Diagrams" rather than the more common 'Entity-Relationship diagrams' (de Carteret & Vidgen, 1 995 p. 1 2). However, this is unusual. Most texts either describe clearly a purist E-R modelling exercise and then its transformation into a 'logical' relational design (e.g. Ricardo, 1 990; Sanders, 1 995; Teorey & Fry, 1 982) or compress the two phases into one data modelling activity where the E-RlRelational hybrid is clearly evident (e.g. Benyon-Davies, 1 996; Simsion, 1 994; Veryard, 1 984; Veryard, 1 994; Watson, 1 996). The anonymous authors of the AURIS A Data Modelling Workshop in 1 99 1 were quite clear about their rather contradictory position,

"The modelling techniques covered in this workshop are based on E-R modelling .. . E-R models are . . .independent of the physical data model under which the database will be implemented. The approach adopted by the

authors however, is to develop models from which relational structures are

directly derivable7 .. (p.9).

There may be another, very pragmatic, reason for the widespread adoption of this hybrid model by practitioners and this relates to the actual process of creating a useful model. A truly infological E-R model may not be directly or easily, transformable into a relational schema. A lack of appropriate unique identifiers, the existence of one-to-one or many-to-many relationships, the existence of derived data and relationships that carry attributes, may accurately reflect the users' view of their domain. However, it may also force the modeller to make decisions, based either on a personal interpretation of the enterprise or on previous experience, at the point of transformation to a datalogical representations. It may well be that the better a conceptual E-R representation is at fulfilling its infological role, the less effective it is at its datalogical role and vice versa. If conceptual models are mainly constructed by IS practitioners rather than users (Hitchman, 1995), then it would not be surprising that the resultant models were favouring the relational, datalogical representation. It may also be, for reasons that will be explored later9, that the construction of a datalogical representation is significantly easier, particularly for less experienced modellers who also have an understanding of relational theory.

It is clear that the E-RJR hybrid models are being constructed and are commonly used in professional practice and the benefits of this approach must seem attractive to developers working within strict budget and time constraints. By combining the infological and datalogical functions of the conceptual model into one representation, there are apparent savings 10 in seeming to speed up the database design process and facilitating the design of the logical data structures on which the functional analysis is often based. However, there are a number of negative implications of this cost cutting. As has been previously suggested, users are reportedly finding it difficult to understand and thus validate these representations. This must reduce the models' effectiveness as

7 Emphasis added

8 Or even earlier. Benyon (1997) suggests that in the E-R model. M:M relationships should be decomposed during the early stages of data analysis.

9 See Page 66 10

strategic planning and communication tools, as well as being of little use in recording

enterprise rules. If even those users who have been closely involved with the

construction of the model find difficulty in interpreting it, it is unlikely to be of much use in a wider arena or over a period of time. The models are unlikely to use elements of the E-R Model, which, while providing rich semantics are not readily implementable

in a relational database. Finally, while organisations may believe that they are

constructing a 'conceptual' model for their applications or even for their enterprise, they are unlikely to reap the future benefits implied by the meta-data architecture previously described 1 1 .

Constructing an E-RlRelational hybrid model

There is another factor relevant to the development of the hybrid and this relates specifically to the process by which it is constructed. While it is recognised that conceptual modelling is a complex activity, particularly for non-expert designers, there has been little research on the actual process of conceptual model construction (Batra & Srinivasan, 1 992; Yunker, 1 993; Benyon-Davies, 1 992b; Srinivasan & Te'eni, 1 995) particularly on the activities within the discovery phase (Kim & March, 1 995). Batra and Zanakis ( 1994) point out, that while most techniques provide methods to translate an E-R diagram to a relational schema, "they do not detail a precise set of rules and heuristics to develop the E-R diagram itself' (p.228). Eden (1996) too comments that "the literature falls short of providing a definitive and comprehensive account of the process of data analysis by the Entity-Relationship approach" (p.42). The guidance generally given is summarised by Fims ( 1 993) thus,

"One approach is to develop the diagram first and then to define data elements and assign these as attributes to the entities in the data model and finally to specify key structures and other details. The second approach is to define the data elements first, then develop the diagram and assign attributes to entities as the diagram evolves. In practice a combination of the two approaches provides the most intuitive basis" (p. 1 15).

While these instructions are specifically related to the use of a CASE tool, the general principles mirror those in most descriptions of the process. Some authors provide a case study to illustrate the process or more commonly rely on simple straightforward examples.

Eden ( 1 996) identifies that, with the exception of NIAM, none of the literature, "succinctly addresses the problem of how to securely guide the novice from a requirements statement in Natural Language to a conceptual schema" and few "recognise the importance of linguistic analysis" (p.43). Instead, novices are directed to identify the

'important' entities and the relationships between them.

Beyond a broad definition of what an entity is, little direction is given in how to identify them. Interviews with users, analysis of input and output documents are usually