• No results found

an integrated framework

"Thus the truth of things may be this: useful things get done by tools that are an amalgam offragments of theories" (Kent, 1978 p. 194).

Introduction

Chapters 3 and 4 explored the need to recognise that there was a requirement for both analysis and design as clearly separate activities within the conceptual data modelling process. The roles of the infological and datalogical aspects of a conceptual model, and their required properties, were clarified. The general approaches of E-R and NIAM

were described in Chapters 5 and 6, and Chapter 7 provided a comparison of the major differences between the two. It was suggested that the NIAM-CSDP was essentially an analysis tool while the E-R or E-R/R approach had a number of characteristics that lent it to design activities. Finally, Chapter 8 demonstrated the confusion surrounding the definition of a conceptual data model and the lack of consensus among researchers on what should be considered desirable characteristics. In chapter 8 it was also observed that the various modelling approaches are generally considered, and compared, as alternative rather than complementary methods. Taking cognisance of these issues, a framework based on an integration of elements of the NIAM-CSDP and E-RIR

approaches to conceptual modelling, is now proposed.

Previous Proposals

In 1 984 Bouzeghoub and Gardarin reportedly developed an expert system, SESCI, "to support requirements collection in natural language followed by logical and physical design of relational database applications" (Ram 1995 p.97). Kent ( 1 983) has also explored along similar lines. With no apparent knowledge of NIAM or the early work which led to its development, Kent (1983) proposed a method which suggested the identification of facts, defined 'as connections between things' as a more effective

method of creating a datalogical structure. One perceived advantage of this method was that no arbitrary decision was required early in the process to distinguish between 'entities' and 'attributes' . Thus both 'Employees have names' and 'Employees are assigned to departments' qualify as facts or more specifically 'fact types or fact patterns', and there is no requirement to allocate any of the objects to a specific type of construct. Having identified the facts that the database is required to maintain, the facts are grouped together in a semi-intuitive way whereby all binary facts that have one object in common are placed in 'pseudo-records' that eventually become relations.

Provision is also made to accommodate ternary facts and also for 'facts about facts' I .

Kent' s (1983) proposal does not appear to have influenced the most commonly used E­ R approaches; nevertheless it is significant for two reasons. Firstly, it highlights and discusses a number of shortcomings of E-R modelling as an analysis tool and, secondly, it suggests the use of natural language facts as a means of overcoming some of these problems. However, the method is not "developed to the point of a detailed procedure" and the article primarily "describes the concepts on which the methodology is based" (ibid. p.4). One of the shortcomings of Kent's proposed method is that it does not clearly delineate the activities of analysis and design. Kent himself seems rather unclear as to where his proposal sits, saying, "we focus on the middle portion of the analysis and design process" (ibid. p.3). Of course, as previously noted in Chapter 4, he is not alone in failing to make any real distinction between the two stages in conceptual data modelling.

It thus seems useful to explore the feasibility of integrating the two approaches of NIAM

and E-R/R modelling to determine whether all or any of the described problems can be

solved, or at least alleviated, by doing so. Rather than attempting to suggest

modifications to any particular system development method, a new approach, the Integrated Conceptual Modelling framework, (INTECoM) is proposed which uses the four-step generic framework of database development, outlined previously on page 27. While the framework covers all four stages in outline, only the fITSt two stages, the

1 These are represented in NIAM as objectified fact types, i.e. fact types that themselves participate in

activities of which have been the focus of the preceding discussions. The discussion of the framework also focuses solely on the design of relational databases.

INTECoM -An Overview

Step 1. The analysis of user requirements

It has already been suggested that analysis is concerned with determining and describing the components of something complex. Regardless of the viewpoint of the analyst, there is a significant element of discovery, or more specifically 'uncovering', about the activity. When dealing with a specific user, the analyst is required to behave in a largely objectivist way to reveal the perceived information requirements of that particular user. A record of those information requirements is required that it is comprehensible to, and verifiable by, both the user and the analyst, and ideally to others, perhaps at some point in the future. The documented account of the information requirements is the infological or analysis model, referred to in some methodologies as the 'requirement specification' . Ideally, the process followed to extract the user requirements should be predictable and repeatable i.e. any analyst given the same task should arrive at the same result. In other words, a prescriptive method is preferable and this also has the advantage of being auditable. In addition, the analysis model needs to be consistent, unambiguous and transformable into a data structuring representation with no loss of validity and, thus, provide a solid foundation from which to begin design.

The NIAM-CSDP provides a procedure that largely meets these criteria. It has been shown to provide a prescriptive method, which requires the active involvement of the user in providing both the facts and the examples. The direct correspondence between the ORM diagram that is constructed and the formalised natural language example fact types from which it is derived, allow for different representations suitable for either the technical or non-technical user with no information loss. Indeed, as Sharp ( 1 994) argues, non-technical users need never see, or even be aware of, the graphical representation. However, the completed model is transformable, by application of a

published algorithm, into a normalised relational design. Most importantly, the

is not necessary to decide on the type of construct2 that will be used to represent an object before the object, or any facts in which it participates, can be recorded.

The use of a CASE tool, such as InfoModeler™, simplifies the collection and maintenance of the natural language facts and the appropriate example set and automates much of the diagram construction. InfoModeler™ can utilise the entered examples to determine many of the necessary constraints and will highlight situations where the analyst has overridden the determined constraints such that they no longer match the examples. In addition InfoModeler' sTM comprehensive 'Verbalizer' reports provide all the fact types sentences together with detailed descriptions of all of the objects, at any point during the analysis. InfoModeler™ also provides facilities for checking syntax, constraints and examples and prevents the generation of a logical model if syntactical problems exist. Finally, InfoModeler™ is able to automatically apply the transformation algorithm and provides what it terms a 'logical model' in IDEFIX3 notation.

Darke and Shanks' ( l 995c) legitimate criticisms of NIAM-CSDP as a tool for requirements elicitation are not addressed by the above description and it is possible that Step 1 of the CSDP could be enriched, as they suggest, by both extending the range from which example sentences are derived and the construction and integration of stakeholder viewpoints (Darke & Shanks, 1 994a; 1 995a; 1 995b; 1 995c). There does not seem to be any inherent reason why the NIAM-CSDP could not accommodate both of these suggestions. Indeed, the "assumption that as only one interpretation exists, only one user expert is necessary" (Darke & Shanks, 1 994b p .5) is a result of the way in which the method has been used rather than an essential prerequisite of the techniques. Neither is there any inherent reason why the CSDP has to be confined to moving "directly from input and output documents to verbalisation" (ibid p.4) nor why conflicting views and alternative viewpoints cannot be explored. If the results of the

2 This point is arguably not as clear-cut as portrayed here. In most version of ORM it is necessary to decide whether an object is an 'entity type object' or a 'value type object', sometimes called a LOT or a NOLOT. However the important point is that no matter which is chosen by the analyst, the functionally appropriate construct will be automatically chosen at the point of transformation.

3IDEFIX is a form of E-R modelling widely used in the US and particularly by the Department of Defence.

NIAM-CSDP are seen as a precursor to design and not as the design itself, the pressure to resolve these differences is largely removed. Darke and Shanks' ( 1995b) proposal to input the integrated viewpoint, developed during the wide-ranging requirements elicitation phase, to the fact type transformation process could in fact be postponed until after the transformation. In this way, a record of each user' s requirement as determined during analysis is preserved as discrete documentation. This could prove useful as a basis for the construction of the external schemata, i.e. Step 4 of the database design process. In addition, the resolution of conflicts becomes firmly a design issue, which is where it more appropriately belongs.

The output of the INTECoM analysis phase is a record of users' data requirements, represented as both a set of formalised natural language sentences enhanced with examples and constraint information, and a diagrammatic representation. This final record may be an integrated view or a collection of individual user views. However, which ever it is, each discrete record should be internally consistent, unambiguous and as complete as possible. It should also have been verified, by the users, as accurate and understood. This model is the conceptual model of the meta-model discussed in Chapter 3.

Step 2. The design of the conceptual schema or logical model

Design has been previously described as the combination of elements into a plan or

scheme that conforms to appropriate functional or aesthetic criteria. Thus the design activity is one of creative, and possibly innovative, construction. It is an attempt to bring together possibly conflicting and disparate elements into a harmonious, and ultimately, useful whole. As such, the activity is difficult, if not impossible, to prescribe, relying as it must on the individual flair and creativity of the designer, who will almost certainly bring past experience and experimentation to the work. Attempts to constrain this creativity by mechanistic prescription are likely to be counter­ productive. However, the designer needs to have clearly defined elements to work with. An understanding of the required data structuring paradigm is an essential pre-requisite, as is a clear idea of what is required, i.e. the user requirements. The final output of this step will be a datalogical or design model of the data structured in a form that is

model is a paradigm model within the meta-data architecture illustrated on page 35.

This output is likely to appear, to the users, to be significantly different from the previous one and while it is impractical to insist that the method used to create it is auditable, nevertheless there needs to be some means of verifying that the original requirement specifications are still being supported.

The E-R/Relational hybrid approach, has been seen to provide techniques which are appropriate to this kind of design activity, at least where the target DBMS is a relational one. It is an inherently creative tool allowing for the development of alternative data structures, which can embody different levels of business rules and constraints. Many of the structures will be those suggested by the patterns identified within the relationships of the data elements themselves but new patterns can be constructed or existing ones enhanced to provide innovative solutions. Used specifically as a design tool, the propensity for entities to be equated with relations is no longer problematic while the need for the designer to make an informed choice of construct for any specific element is no longer dangerous but to be positively encouraged. By the same measure, the requirement for entities to be strictly typed is no longer a cause of difficult communication between the user and the analyst/designer. Instead, it can become a positive advantage to the designer who is now concerned with identifying entities that can be transformed into their strictly typed counterparts within the relational model. In design, there is no longer any expectation of one 'correct' answer but instead an expectation of a number of useful solutions all of which will have their own advantages and disadvantages. Likewise, decisions as to which part of the system will handle each business requirement, can have a direct bearing on the form of the data model (Simsion,

1 994), and belong more properly to the designer rather than to the analyst. After all,

understanding the compromises and trade-offs involved in the final choice is part of the designer' s skill.

Many CASE tools support the use of the E-R/R hybrid techniques, which are promoted

here, and a review of these is beyond the scope of this discu�sion. Even InfoModeler™

provides some support. The logical model, which InfoModeler™ creates is an E-R/R

hybrid model and can be modified in keeping with the design activities suggested above. As InfoModeler™ does not support any form of functional or behavioural modelling it

logical model can be turned directly into a database schema which can be input into any CASE tool which supports the reverse engineering of logical models from database specifications.

The suggested input into the design stage is the output of the analysis phase, described above, which is immediately transformed into a relational logical design. This may be one integrated design or preferably a number of discrete designs, representing the individual user views, for integration by the designer. Design thus begins with clear statements of user requirements, represented in a form that is both familiar and appropriate to the designer. The only requirements which may not have been included in the analysis phase are those pertaining to the less concrete areas such as those which arise from future expectations of the system. These requirements, being only possibilities may not have been identified or fully captured during analysis. An essential ingredient of the design model has to be the flexibility to adapt to future possibilities and the designer needs to be aware of and prepared to incorporate these. Apart from these unknowns, the designer is able to gain a holistic view of the system' s requirements relatively quickly and, if the analysis has been carried out competently, with an assurance that no nasty surprises await discovery at a later stage in the process. Thus the development of alternatives and possibly the creation of exploratory prototypes can probably begin early in the design phase. The output of the design stage is a data model conforming to the appropriate paradigm constraints (i.e. normalisation) and ready for transformation to a physical database schema. Its form will thus conform to the usual expectation of the E-R/R hybrid approach, that is an E-R/R diagram4 supported by the usual data dictionary documentation.

This final design model may well be unrecognisable to the users who provided the initial specifications, yet it is essential that they are able to judge that, despite the resolution of conflicting requirements and incorporation of future possibilities, their requirements can still be met. It appears that, once again, the situation requires users to understand and verify design documentation. This is obviously not acceptable and any proposal that leads to this endpoint is unlikely to hold any advantages over its

4 As supported by most CASE tools and practitioner methodologies. It is specifically not the Chen standard.

predecessors. Fonnalised natural language has been advocated as the preferred user representation for the analysis model and it is proposed that the design model follow a similar course. Chapter 1 0 describes a method for extracting NIAM type sentences from an E-R/R model to provide not only an understandable translation of the design specification but also a means of linking the design model directly back to the original user requirements.

Step 3. The design of the internal schema or physical database

This phase would seem to be much less problematic, with many textbooks and industrial

courses providing a standard approach (e.g. Connolly et al, 1 995 ; Date, 1 995; Ricardo,

1 990). While a detailed discussion of this step is outside the scope of this study it is nevertheless interesting to note that it is generally recognised as having many of the characteristics of a design activity, but one clearly bounded by certain parameters. Many of those parameters are detennined by the technical and environmental constraints imposed by the specific DBMS and implementation environment in which the design must work. Thus there are clear functional criteria to which the design must conform. However, it is also accepted, that beyond some possible denonnalisation decisions required by specific performance considerations, the physical design should not deviate in any fundamental way from the structure of the input logical or conceptual model. This is in contrast to many of the previously quoted guidelines for the use of the E-R and E-R/R approach, which begin with an almost clean slate and are guided by no functional criteria apart from the need to produce a communication aid and to avoid any

implementation bias. Database designers are also expected to bring their past

experience to assist in solving current problems and are also rewarded for innovative solutions, which nevertheless confonn to the constraints, placed on the design.

Step 4. The creation of the external schemata

This step addresses the need to reproduce the original views for each individual user and is usually viewed as merely a task within the previous activity. It is recognised here as a separate step partly for consistency with the generic framework and partly to highlight a