An extended relational data base management system for engineering design

(1)

RIT Scholar Works

Theses

Thesis/Dissertation Collections

1986

An extended relational data base management

system for engineering design

Charles Fung

Follow this and additional works at:

http://scholarworks.rit.edu/theses

This Thesis is brought to you for free and open access by the Thesis/Dissertation Collections at RIT Scholar Works. It has been accepted for inclusion in Theses by an authorized administrator of RIT Scholar Works. For more information, please [email protected].

Recommended Citation

(2)

Approved By:

An Extended Relational Data Base Management System

for Engineering Design

by

Charles Fung

August 6, 1986

A thesis, submitted to the Faculty

of the School of Computer Science and Technology,

in partial fulfillment of the requirements for

the degree of Master of Science in Computer Science

1-'J3-~7

bfofes,61

J

eh'rey Lasky (thairman)

Professor Chris Comte

(3)

Rochester Institute of Technology, to reproduce my thesis in whole or in part. Any

reproduc-tion will not be used for commercial use or profit.

(4)

At present, the use of the relational model in the _{engineering design} support domain is

restricted due to the following: lack of _ability to handle complex objects, no support for

Abstract Data

Types,

inappropriate concurrency control for

long

transactions, no supportfor

versioning and update_propagation, poor_efficiency, and insufficient design rule checkingand

consistency contraints enforcement. A simple relational database management system is

designed and implemented under the UNIX _operating system to incorporate two major

extensions: support of user-defined Abstract Data Types and _operators, and built-in Design

Data Versioning. The

design,

implementation,

and possible extensions to these new facilities

aredescribed.

(5)

The author would like to express gratitude to Professor

Jeffrey

Lasky,

the chairperson

of the thesis committee, and to the other members ofthe committee for their invaluable gui

(6)

1. INTRODUCTION 1

1.1 Area ofInterest 1

1.2 Applicationofthe RelationalModel to

Engineering

Design 2

1.3 Problem Identification 3

1.4 Background andRecent Developments 4

1.4.1 Overview 4

1.4.2 Abstract DataTypes andComplex Objects 4

1.4.3 Version Control ofDesign Data 7

1.4.4

Concurrency

Control 8

1.4.5

Efficiency

ofRelational DBMS 9

1.4.6 Other Products 10

1.5 Research DBMS and Planned Extensions 12

1.5.1 A Basic RelationalData Base Management System 12

1.5.2 Design ofPlanned Extension 14

1.5.2.1 DefinitionofADT's and Associated Operations 14

1.5.2.2

Arbitrarily

Long

Text 15

1.5.2.3 Design Data Version Control 16

1.5.3

Summary

ofPlanned Extensions 18

1.5.4 Outline ofThesis 18

2. FUNCTIONAL SPECIFICATIONS 19

2.1

Operating

Systemand Implementation Language 19

(7)

2.2 Functional Definition ofSupported DBMS Commands 19

2.2.1 DefinitionofObjects in The Data Base 19

2.2.2 Relational Algebraic Commands 21

2.2.3

Versioning

Control Commands 24

2.2.4 Miscellaneous Commands 25

3. ARCHITECTURAL DESIGN OF DBMS 29

3.1 Introduction to Design ofResearch DBMS 29

3.2Overview ofResearch DBMS 29

3.3 User Interfaces 31

3.4 Dataand Control Flow 33

4. MODULAR DESIGN OF DBMS 35

4.1 introduction to Modular Design ofDBMS 35

4.2Data Base Subsystem 35

4.2.1 Relation Package 35

4.2.2 Record Package 38

4.2.3 Design Data Version Control 40

4.3 AbstractDataType Package 41

4.4DataType Manager 44

4.5 Expression Package 44

4.6 Outpututilities ofDBMS 47

5. TEST CASE FOR DBMS 49

5.1 Graphical Representationof DBMS Structure 49

5.2 DataSchema ofResearch DBMS 49

(8)

6. CONCLUSIONS ANDFUTURE ENHANCEMENTS

1

57

6.1 Conclusions from The Research DBMS 57

6.1.1 Research DBMS 57

6.1.2 Support for Abstract Data Types 58

6.1.3 Support for Design Data Version Control 59

6.2Suggested Future Enhancements to the DBMS 59

6.2.1

Indexing

for

Efficiency

59

6.2.2

Programming

Interface for DBMS 60

6.2.3 Compilation ofADTOperators 60

APPENDIX A - SYSTEM BUFFERS 61

APPENDIX B - PROGRAM LAYOUT 64

(9)

INTRODUCTION

1.1. Area ofInterest

The focus of this thesis is the use of relational data base management systems in

engineering environments.

Specifically,

the thesis considers two extensions to the relational

model which appear attractive for _{engineering design} environments. These extensions are the

support ofuser-defined abstract datatypes and _operators, and design data versioning.

Digital computers are

increasingly

being

used to automate different aspects of engineer

ing

design and manufacturing.

Typically,

computer support for these activities has been

implemented as acollection ofdisjoint programs, and so the _sharing ofdata between applica

tions has usually involved resource-intensive data conversion. While data base management

systems

(DBMS)

have been well developed and _widely adopted in the business community as

a powerful _{data processing} tool, their application in the engineering environments _{has only}

recently gained attention.

From a user's perspective, integration of different phases of the design process implies

the _capability to perform complex operations, such as _{design capture,} _analysis, or _synthesis,

on design information atany level of abstraction without concern for the _underlying mechan

ism of datastorage. As engineering designs become more _{complex, the} costs of such integra

tion through manual data conversion soon become prohibitive, due to high

(typically)

mainte

nance costs. It has become apparent that a centralized data

base,

containing all ofthe design

and _{manufacturing}

data,

would be the most effective and efficient approach _{for supporting}

(10)

Compared to other data models such as hierarchical and network _{models, the} relational

model provides a more

flexible

view of the _underlying data. To the engineer who _may not

have much knowledge about data _structures, the data is presented in its most natural

form,

which is tables (or more _{strictly speaking,} relations). The navigable links between the rela

tions are transparent and _easily comprehensible. Most important of _all, the relationalmodel

and _{its corresponding} _{relational algebra permit relationships} _to _be

dynamically

_created. _Rela

tionships are

implicitly

updated whenever the value of an item changes. The data schema

can be _easily altered without the need for major changes to application programs and the

data base. The real power oftherelational model comes from itsrelational_processingcapabil

ity

provided

by

its_{algebra, using} operations such as select,project andjoin. _{The ability}ofthe

user to obtain different views of the data through the use of the relational algebra facilitates

the efficient _sharing ofdata between applications as well as the dynamicchanging ofthe data

schema.

Furthermore,

in the _{engineering design} _{environments, the} data and the schema of the

data base are _usually in a dynamic state. Because of the high degree ofdata independence

maintained in the relational data

base,

these fluctuations are _efficiently accommodated.

Therefore,

most researchers agree that the relational system is the natural candidate for

engineering design applications.

Currently

available relational data base management systems exhibit major inadequacies

when their suitability to _engineering applications are explored. The research results reported

here include a design and implementation of a relational data base management system that

incorporates several enhancements to the relational model which are targeted at _overcoming

(11)

Early

attemptsin

introducing

DBMS to various _engineering fields were made

by

adopt

ing

existing systems, already in use, to store and manage engineering data. The reason for

using "offthe shelf DBMS was, of_course, to save the time and _money otherwise needed to

design a

totally

new system.

Unfortunately,

most of those attempts were unsuccessful,

because _commercially available DBMS were almost

invariably

designed _specificallyfor busi

ness use.

Many

of the earlier _attempts, as presented in

[MASA74],

[KORE75]

and

[WONG79],

were evaluated and found slow and awkward to use.

They

were expected to

generate solutions to all the problems of _{engineering design} _automation, but all were found

lacking.

A comparison ofthe characteristics ofbusiness and engineering data bases presented in

[SIDL80]

reveals anumber ofdiscrepancies between what commercial DBMS offer and what

an _{engineering data base} requires.

In general, the

following

are the major areas of _{identified deficiencies in existing} rela

tionaldata bases.

- Lack

of_abilityto handle complexobjects

- No

support forabstract datatypes

-Inappropriate concurrency controlfor

long

transactions

- No

support forversioning and update propagation

-Poor efficiency

- Insufficientdesign _rule

(12)

1.4.1. Overview

Numerous attempts have been made to solve some or all ofthe problems related to the

use of relational data bases in the_engineering design domain. Although most ofthis research

was carried out _separately, it was _generally aimed at common targets. Although some

attempts

failed,

quite a few did grow to _maturity with at least partial success. More

noteworthy examples include the Mosaic Object Manager

[ATW085],

IBM System R

[LORI83],

IBM SQL/DS

[LORI85],

the experiences of

[STON83]

and

[BARA85]

in

INGRES,

CORD Data System

[VU85],

and

Boeing

IPIP [JOHN85].

Instead of

tracing

each research attempt from its time of conception to

date,

agrouping

of different research efforts under specific major areas ofinterest is presented. This will be

followed

by

apresentation ofsome

interesting

products which are _commercially available.

1.4.2. Abstract Data Typesand Complex Objects

Design support applications are some ofthe most complex artifacts of human endeavor

in the sense that _they attempt to support one ofthe activities of the human mind most diffi

cult to reduce to rule and number creation.

Design activities confronted

by

engineers _today are _gettingso complex that somedesign

goals are impossible to achieve without the

help

of computer aided design tools. This situa tion is most prominent in the dynamic world of

Very

Large Scale Integrated

(VLSI)

Circuit

design. As pointed out in

[GADI85],

the VLSI design process can

be,

in an _already simpli

fied manner, decomposed into six hierarchical levels: asystem

level,

an InstructionSet Archi

tecture

(ISA) level,

a Register Transfer Level

(RTL),

a logic

level,

a device level and an art

work level. System level design aids, such as the Architecture Design and AssessmentSystem

(ADAS),

includes design capture, synthesis and analysis or verification tools that operate at

(13)

formthe same generic functions butoperate at the

logic, device,

or artworklevel

One _way throughwhich people _{normally deal} with_complexityis abstraction, and one of

the problems of

introducing

data base

technology

into design automation is that most design

objects require attributes with more general data types than those supported in current

DBMS.

To this _end, researchers in DBMS have turned to the use of Abstract Data Types (hereafter referred to as

ADTs),

which _{already have been extensively investigated in} pro

gramming language contexts.

Early

exploration in this area can be found in

[LOCK79],

[ROWE79], [SCHM78],

and[WASS79].

As defined in

[STON83],

an ADT is an encapsulation of a data structure, with its implementation detail hidden from the user, along with a collection of related operations

defined on this encapsulated structure. In order to use ADTs in a data

base,

their existence

has to be registered in the data base and then the allowable operationshave to be defined

by

procedures written in a language that supports both data base access and the ADTs.

Using

ADT's,

the designer can define and manipulate complex design objects at an abstract

level,

withoutworrying about theirlow level details. ADTs canalso limitthe accessto arelationin. prespecified ways, therebyguaranteeing ahigher level of_{data security} anddata integrity.

[LOOM85]

introduced a three-architecture framework _{for achieving} a _step toward

integration through logical data modeling. The three architectures are

Information,

Computer

Systems and Control Architectures, which can be

directly

mapped into the three-schema

approach at the logicaldata

level,

namelythe

External,

Internal andConceptualSchemas. The

three-schemaapproach has beenadopted

by

the IGES (International Graphics Exchange Stan

dard)

Committee in their development ofthe architecture for PDES (Product Definition and Exchange Standard). In their logical data model, an _entity is defined to be a class of an

(14)

fun-Generalization relationship enables a set of similar entities to be viewed as a single'

entity (in

artificial

intelligence terminology,

_this _is _an

"a-kind-of

or "is-a" relationship). An Association

relationship relates entities that are _{different but} not _necessarily on the same level of abstrac

tion (in artificial

intelligence terminology,

this is an

"an-instance-of

relationship). An Aggre

gation _relationship _{enables a collection} of dissimilar entities to be viewed as one entity. (In

artificial

intelligence terminology,

_this is a

"consists-of

or

"a-part-of

relationship).

While Association relationships have always been supported

by

all commercial

DBMS,

the rest are new data types. Aggregation relationships are better known to the computer sci

ence research _community as complex _objects; whereas generalization relationships

basically

encapsulate most of the work

being

done on what has been

broadly

termed Abstract Data

Types. Apart from more detailed descriptions of the properties of these relationships, the

authors in

[ATW085]

also introduced their Mosaic Object Manager which formed a testbed

for their data model.

Long

fields(an instance of

ADT)

contain _arbitrarily

long

bit strings and

character strings. For this

ADT,

unusual recovery and

buffering

techniques are employed

and are discussed in

[BANE85]

in relation to systemsimplementation. The proposed solution

is to maintain a pool ofdifferent sized blocks tobe allocatedto

long

fields ofdifferent

length,

so that block fragmentation can be kept to a minimum. This is similar to thefast file system

in UNIX. 4.2BSD.

The idea of aggregation relationships (complex objects), has been extensively investi

gated

by

researchers at the San Jose IBM Research

Laboratory

[LORI83],

by

implementing

them as extensions to their System R DBMS. Their implementation involved a _map for each

complex object

defined,

whichcontains the number of elementsand where they can be found.

Tuples in relations are assigned system-generated, unique identifiers at the time _they come

into existence, and two new columns are introduced to model complexobjects. These are the

(15)

tuples in the same and/or other relations. The other two types ofrelationships are modeled

using a reference column. Note that this new datatype is actually aphysicallink across rela tions at the

implementation

level and is actuallya departure froma true relational model, but

its use _actually resulted in a performance advantage. Theirwork _{has recently} materializedin

a new _{implementation based} on the relational product SQL/DS

[L0RI85],

with other exten

sions tohandle

long

transactions and object versioning.

1.4.3. Version Control ofDesign Data

In

[HAYN83],

anew _data type "accessattribute"

was introduced. Attributes

having

this

particular type _may take on values which can be used to access a particular "instance" of a

relation,

by defining

an additional "control" relation that contains

history

information. Each

instance of _any relation is uniquely defined

by

the relation name and the access attribute in

the control relation. Yet another method proposed was to add two columns in each relation,

namely "revin" and "revout". The revin column of a tuple represents the revision number at

which this tuple was added to the relation and the revout contains the revision number at

which this tuple becameobsolete.

On a more abstract

level,

a version of a high level object, (a complex object or even a

relation), is just another instance ofthat object. In the Mosaic Object Manager

[ATW085],

the most recentversion of an object is stored in a

fully

articulated

form,

while older versions

are stored _using a backward differential representation, resulting in a _{very low} storage over

head for older versions. Version paths are allowedto fork offto formalternatives,which can

be

individually

addressed _{using the} same version-id scheme as in specifying different ver

sions. Individual versions of high-level objects are used to group mutually consistent sets of

related _objects, and this provides much finer granularity in objects

locking

in _concurrency

(16)

ferent versions, each

being

_represented

by

another system-generated complex object that

holds all _necessary

bookkeeping

information.

_Alternatives _and _revisions _{are modeled}

by

logi

cal version clustersthat package versions of one designobjectintogroups.

1.4.4.

Concurrency

Control

As pointed out in

[LORI85],

one simple approach in solving the problem of

long

tran

sactions common in _{engineering data} base environments, is to let an engineer check out an

object from ashared

(public)

data base and store itin a private data

base, leaving

a nonvola

tile lock in the shared data base. When the design of that object has reached a consistent

state, the engineerchecksin the altered object back into the shared data

base,

thereby releas

ing

the lock. This is exactly the approach adopted

by

the extended System R [LORI83].

Recovery

iseffected, incaseof afailure

during

check

in, by

a

log

file recordingall changesto

the public data base when the checked outobject isreturned.

A mechanism for supporting nested transactions

[K.IM84]

has been proposed to allow

team members involved in the design ofthe same object to exchangeincomplete objects (par

tial _{consistency) but} it was not implemented because of its sheer _complexity and the con

straints its protocols impose on the user.

Instead, [LORI85]

turned their attention to versions

and an access control-based cooperation mechanism.

Basically,

users are assigned to one or

more usergroups. A group may

belong

to other groups. There is a predefined _group, called

PUBLIC,

that contains all groups as well as users thatare not _{in any}other group. Each com

plex object

(versions)

is owned

by

exactly one user or one _{group, resulting} in a number of

disjoint logical data bases. Users may only access the logical data bases that are owned

by

themselves or

by

a group to which _they

belong directly

or indirectly. In _{this way,} versions

(17)

It was pointed out in

[HAYN83]

that proper design ofa data

base,

to achieve normali

zation of the _{data schema,} can minimize data _redundancy and result in improved perfor

mance. Direct pointer links between tuples in the same relation or between tuples of dif

ferentrelations canbe usedtospeed _up dataretrieval.

Through their experience _{in extending INGRES} to handle ADTs

[STON83],

their con

clusion was that provision for access

by

spatial location will _{substantially} reduce search time

for geometric data. This can be implemented

by defining

a spatial

"bin",

or a _grid, with an

index which is a file _containing, for each

bin,

pointers to all the objects that might overlap

that bin. Extended_{secondary indices} for columns _{containing ADTs} are also implemented

by

allowing users to define their own

hashing

functions,

or some functions that will effect the

indexing

ofthe

ADT,

in INGRES.

Yet another approach was presented in

[LENZ84]

to add data base management func

tions in the operating system kernel to speed _up data access. Due to the fact that most exist

ing

DBMS are

disk-resident,

extensive overhead, in terms of disk

I/O,

is required when a

data base is searched.

They

defined a "persistent object"

to be a named collection of bytes

under the control ofthe data managementkernel. These "persistent

objects"

are mapped into

virtual memory when accessto themis requested through asystem call. The kernel provides

services to create, manipulate and delete a persistent object. It can be considered as a gen

eralization of the superblock in the UNIX* file system.

Being

first implemented in the

UNIX*

kernel,

it has the additional advantage of _generalizing advanced facilities such as

concurrency control, support of backout and commit, support of a _variety of share _options,

etc. in the kernel for all types ofdata management activities.

(18)

1.4.6. OtherProducts

Some successful (or at _{least partially successful)} products _resulting

from'

years of

research have _already been mentioned: Mosaic Object Manager

[ATW085],

IBM System R

[LORI83],

IBM SQL/DS

[LORI85]

and the experience of

[STON83]

in INGRES. The fol

lowing

are examples of other attempts notyet_mentioned, each

having

their unique features.

CORD Data System

[VU85]

is an integrated environment for project control and data

managementfor _engineeringdesign dataintroduced

by

CADTECCorporation. CORD is based

upon a relational model and includes desired features such as _versioning, _concurrency control

and access control. CORD includes an Alert feature which is an "automatic process"

invoked

by

the system based on specific design activities that affect the state of the project. This

facility

can trigger actions as simple as notification of specific users via mail upon detection

ofmodifications in a project _segment, or as complex as

initiating

user defined processes. In

CORD,

a project can be promoted ordemoted to different phases ofthe development process.

Alert can _{automatically} trigger processes to perform the necessary checks, such as _running

verification tools to check design rules or constraints that are required for the design to pass

to thenextpromotion level.

In

[BARA85],

INGRES is once again chosen as the fundamental

building

block on

which a modular system for

handling

design data was created. Instead of _{simply adding}

more routines to INGRES itselfas in

[STON83],

their implementation provides the user with

two

logically

separate modules which serve as an additional interface between the user and

INGRES (The user can still access INGRES facilities

directly,

if necessary). The Data Base

Interface

(DBI)

facilitates users'

access to the public data

base,

which is maintained

by

INGRES. It also supports complex objects, concurrency control and constraints enforce

ment. The second module is the Local Information

Processing

Subsystem

(LIPS)

that allows a

user to map out a subset ofthe public data base

(possibly

into local memory) after

freezing

(19)

INGRES (which is

basically

disk resident). The user has the choice of_{putting individual}

relations defined in his view on disk or in main memory. LIPS and DBI can be considered

as subsets of the extended System R _previously mentioned. Although LIPS has various

features that makes it resemble a

DBMS,

the biggest difference is that the data LIPS main

tains isusuallyvolatile.

Finally,

The IPAD project (Integrated Programs for Aerospace-Vehicle

Design),

which

began in 1976 at the

Boeing

_{Computer Services}

Company,

recently materialized into the

EMIS project

(Engineering Manufacturing

_Information

Systems)

[JOHN85]. IPIP (IPAD

Information

Processor),

one of the facilities provided in

EMIS,

currently supports multiple

data models, multiple levels of_{schemas, geometry}

data,

datasets, and _{concurrent, multi-user,}

multi-thread access through multiple application interfaces in a distributed environment

IPIP supports both therelational andnetwork data models via a single

family

ofdata

descrip

tion languages and a single data manipulation language. A _{data base may be} viewed simul

taneously through relational, network and hybrid schemas, and a logicalschema of one model

can be mapped to a schema of another data model freely. Another

interesting

feature ofthe

IPAD DBMS is its ability to allow users to declare rules and/orconstraints on data relation

ships within and between relations. Apart from allowing the users to turn rule checking off

and on, italso supports a tolerance capabilitythrough which users _{may specify} an approxima

tion of_equality to be used in qualification operations. This is

highly

desirable since _nothing

(20)

1.5. Research DBMS and_{Planned Extensions}

1.5.1. A _{Basic Relational Data Base Management System}

This is a simple relational DBMS that the authorhas built as part ofthe course work in

the

Fall,

1984 _offering of_{"Data Base}

Implementation".

_Through _{a simple} _interactive

interface,

the

following

operations onthe data base are supported:

- define

<relation> {<attribute> _<type>}

-append <relation>

{<attribute>}

-display

-project <relation> [over

{<attribute>}]

[unique]

- deletefrom

<relation> (where <attribute>=

<value>}

- join

-update <attribute> = <value> in <relation> {where <attribute> =

<value>}

-retrieve<attribute> from <relation> {where <attribute> =

<value>}

Each data base under this DBMS is grouped under a _{subdirectory in} the UNIX* file

system, together with its associated system relations. A

"r_tbl_ind"

table stores the names of

all the relations defined in the data base. Each tuple contains an index to a "r_tbl_dat" rela

tion which contains further informationof thatrelation such as the record size of each _tuple,

the file name under UNIX* that stores this relation, and

finally

an index to the "a_tbr. The

"a tbl"

records the attributes associated with each relation in the form of a linked-list. This

departure from the true relational data model is purely for efficiency and tofacilitate garbage

collection. Each relation is implemented as two separate files under the UNIX* file _system,

namely anindex file and adata file.

(21)

The

following

are general

features

ofthissimple DBMS:

(1)

All index

files,

including

that for the relation _table, are maintained as sorted and

sequential files. Retrieval _using _primary

key

_is performed

by

_reading in the

entire _{index file into} a buffer internal to the program and performing a

binary

search for the key. A hash table might be more efficient in

finding

one record

but it requires _reading in the table into virtual _memory and sorting it for each

display,

not to mention the undesirable effect of

having

to predefine the size of

the files.

(2)

A

binary

search either returns the locationof where the

key

was found or where

the new record should go ifthe

key

was notfound. Insertion of anew record in

an index file is in effect an insertion sort _whereby all records

having

a

key

less

than that of the new one are first written out to the

file,

followed

by

the new

record, and

finally,

the rest ofthe original file.

(3)

All data files are arranged as unsorted sequential records pointed to

by

pointers

in the _{corresponding} index file.

Memory

freed

by

delete operations is reused in

future append or define operations

by keeping

a "next free" pointer and an

"empty

record count"

in the header of each data file. Records in the attribute

table thatpertain to the same relation are linkedup

by

pointers.

(4)

The whole data

base,

including

the relation and attribute _tables, resides in

secondary storage at all times so that the DBMS program can be invoked and

terminated repeatedly without affecting the data stored.

Any

operation that

involves the updating or

inserting

of new records is not performed until the

correctness ofthe datahas beenchecked.

(5)

Reasonable error checking has been built into the program.

Missing keywords,

type mismatch or error in _accessing any part ofthe data

base,

as well as possible

(22)

aborted. An on-line quick reference of the legitimate commands and their syn

tax isalso

incorporated

intothe program.

The majorlimitations inthis DBMS are as follows:

(1)

The only data types supported are that of integer and character strings. To

better support user-defined

ADTs,

more base types should be supported (for

example, realnumbers).

(2)

The "where"

clause in conditional expressions _only support conjunctions. This

hasto be extendedto handle more complex expressions.

1.5.2. Design ofProposed Extensions

The

following

sections _present, insome

detail,

the proposed extensions tothe basic rela

tional data base presented insection 1.5.1.

1.5.2.1. Definition ofADT's andAssociated Operations

Support of ADT's involves

defining

a set of system

functions,

in addition to those

which implementthe basic relational _algebra, to allowusers to define new data types and the

new operations associated with each newtype. The

following

new operations are required:

-define_type"

<type> {<member> <type>)

-delete_type"

<type>

-display_type"

[<type>]

-define_optr"

-display_optr"

[<optr>]

-delete_optr"

<optr>

In addition to the system table Relations which stores information on the user-defined

relations in the data

base,

an additional set of system generated and maintained tables are

(23)

operations.

Basically,

a ADTNAME relation contains the name, number of members and the

length ofthe internal representation of each user-defined ADT. Another ADTMEMBER rela

tion is used to

keep

track of each ofthe individual members of each ADT and their data

types. _{Abstract data} _types can be defined as _elementary types or as _recursively predefined

ADT aggregates. Yet another OPERATORS relation is updated when a new operation is

defined on _{already defined ADTs} to reflect the _{name, cardinality,} precedence and types of

the _operand(s) and resultant This particular function of the DBMS also involves compiling

the user-defined functions that implement these operations and

linking

them into the DBMS.

Of_course, the idealcase is to linkthemin

dynamically

atrun time.

Furthermore,

the command parser has to be modified to recognize these user-defined

ADTs and know when and where to invoke the user-defined functions that perform the

user-defined operations on these ADTs. Assignment of values (ofsome other attributes orin

external _{representation)} to attributes of equivalent data types should be handled automati

cally. The internal representation and manipulation of ADT must always be transparent to

the user.

1.5.2.2.

Arbitrarily

Long

Text

One instance ofADT that is of importance in engineering design applicationsis that of

arbitrarily

long

bit and character strings for storing

images,

graphical andtextualdescriptions

ofdesigns. Instead ofrequiring the data base designer todefine his own ADT tohandle

long

text, a built-in textdata type will be implemented in the DBMS. The UNIX* file systemcan

be used to serve this purpose but this method uses additional inodes of the UNIX* file sys

tem unnecessarily.

Moreover,

the UNIX* file system imposes aupper limiton the maximum

number of files permissible on the system.

Therefore,

a more practical approach is needed.

This suggests that a mini file system that resembles the UNIX* file system should be

(24)

tained

by

the DBMS itself tomanage textual

information.

Instead of

imposing

alimiton the

size of the data space as in the UNIX* _file _system, _{data blocks} _should _be _linked

up as a

linked listtoallowforunlimitedgrowth.

A file separate

from

all other relations will be used for _storing fixed length records of

long

text or

long

strings of bit pattern. A "text"

datatype is

internally

represented

by

a

long

integer which is an index to the _starting record in this file. The subsequent records, ifany,

will be linked up in the form of a linked-list. Records freed from "deletion"

operation are

rechained in the "free list" in that file. This prevents large gaps of unusable _memory space

inside the file. This implementation was adopted to implement the "ajbl" as presented in

section 1.5.1 and was found to performsatisfactorily.

1.5.23. Design Data VersionControl

In order to support _versioning and configuration _management, an additional system

relation is required to store the

history

for different alternatives as well as the different ver

sions within each. This

"history

contains information such as alternative name, version

number, the date when it was started, and a description of the characteristics of that

version/alternative. The system also has to maintain two system-wide registers to

keep

track

ofthe current alternative and version that the user is _workingon at _any pointin time.

Each tuple in all relations will have

(

n * v

)

bytes associated with it to store the infor

mationconcerning the alternative and version when that tuple is "visible"

to the user, n is the

maximum number of alternatives of the same design allowable in that particular data base

and has to be defined

by

the user when the data base is created, v is the number of bytes

used to store the versioning information and is split into two equal parts. The first is the

revin which is the version when that tuple was first

introduced,

and the second part is the

revout which is the version when that tuple becomes obsolete and should therefore not be

(25)

(

revout <= _{current version}

) || (

revin > current version

) )

For _{example, using} one _{byte for versioning information}per alternative makesitpossible

to have a maximum of 15 different versions for each alternative

(

versions 0 - 14 ). Addi

tionalstorage overhead comes in the form ofthe additional bytes associated with each tuple,

but this is still _{a significant savings compared} _{to the}

memory required to duplicate tuples

shared

by

differentversions/alternativesthatare otherwise identical.

Referring

to the description ofthe basic DBMS in section

1.5.1,

each relation in adata

base is implemented as a sorted index file _containing the _primary

key

together with a ran

domly

ordered data file. The version bytes can be tagged onto the entries of the index file

and together_{they form} anew _primary

key

that _uniquelyidentifieseach tuple in therelation.

Data access procedures will have to be modified to recognize the existence ofthese ver

sion bytes. The particular set of version bytes to look at in a search operation depends on

which alternative is active at that particular moment.

Similarly,

update procedures will have

to _modify these special bytes in each tuple to _correctly reflect at which version and alterna

tive the update was carried out. Deletion of a particular tuple will be reflected as that tuple

being

obsolete from the current version onwards

(by

changing revout). The tuple still exists

physically in the data base so that it is visible ifthe user chooses to look at previous versions

of the same design. Updates of an attribute within a tuple which was carried over from a

previous version will require the duplication of the existing tuple to a new tuple, with revin

set to the current versionnumber.

Thereafter,

subsequent updates can be

directly

performed

(26)

1.5.3.

Summary

ofPlanned

Extensions

An extended relational _{data base} _management _{system will} _be _implemented _under _the

UNIX*_{operating systems,} _with_the

following

capabilities:

- basic data

types supported are

integers,

_{character strings} of predefined

length,

real

numbers, andtextof_arbitrary length

-support for basicrelational algebraic operations suchas _project, select andjoin

-provision for user to define ADTs as aggregates of_{elementary data} types and/or recur

sively defined ADTs

-provision for user to define new operations on _{already defined ADT}

by

_supplying C

functions that implementtheseoperators

-ability to maintain different alternatives and/or versions of design

data,

allowing users

toview_onlytheselected version ofthe current alternative

being

worked on

1.5.4. OutlineofThesis

Chapter Two contains the functional specifications of the DBMS. Chapter Three

presents the architectural backbone of its

design,

which should give the reader an overview

of the whole DBMS. Chapter Four breaks the DBMS down to its separate components and

show their design and implementationin detail. Chapter Five describes the test case used to

evaluate the DBMS.

Finally,

Chapter Six draws conclusions from the tests described in

Chapter Five and alsopresents possible future improvements to the DBMS.

(27)

FUNCTIONAL

SPECIFICATIONS

2.1.

Operating

System andImplementation Language

The UNIX* _operating system is

becoming

more and more _widely used in the engineer

ing

field both at the application and developmentlevels.

Therefore,

it is chosen as the target

operating system for the development of this extended DBMS. The programming

language,

C,

is the language chosen to implement the DBMS because ofits low level byte manipulation

capability. This DBMS is written as a_utility in the UNIX*universe and is command driven.

The

following

sectiondefines the functional capabilityofthis DBMS.

2.2. Functional Definition ofSupported DBMS Commands

2.2.1. DefinitionofObjects in The Data Base

An object in the data base can be a relation, an Abstract Data

Type,

or an operator

defined

by

the user.

Any

such object comes into existence after itis registered to the DBMS

by

the user _using one ofthe

following

commands. Like other forms ofdata in the data

base,

visibilityof such objects are controlled

by

the version control systemin the DBMS.

The command

definerei<rel_name> <att_name att_type>...

defines a new relation with name <rel_name>,

having

attributes with data types defined as

pairs of <att_name att_type> on the command line. The data type of each attribute must

UNIX isatrademark ofAT&TBellLaboratories.

(28)

either be a basic pre-defined data type or a _{previously defined} Abstract Data Type. The

relation file for the new relation is created and

initialized,

and a new tuple is added to the

"RELATIONS"

system relationto reflect the existence ofthe new relation. The attribute list

is storedinthe recordfile "ATTMEMS".

The command

defineadt<adt_name> <adtnameadt_type>.~

defines a new Abstract Data Type with name <adt_name>, which is made up of elements

specified as <adtname adt_type> pairs. The data type of each element must either be abasic

pre-defined data type or a previously defined Abstract Data Type. Upon the validation of

the specified datatypes for its elements, the definition ofthe new ADT is stored inthe rela

tion file "ADTS"

and the record file

"ADTMEMS"

to reflectthe existence ofthe new ADT in

the current version ofthe design data base.

The command

defineop

<op_name> <prec> <card> <ltype> <rtype> <restype> <filename>

defines a new operator that acts upon either predefined basic data types or _previously

defined Abstract Data Types (or a combination of both). <prec> defines the precedence of

the new operator with respect to other operators already defined in the data base. <card>

defines the _cardinality of the new operator and is either 1

(unary)

or 2 (binary). The data

types ofthe left and right operands are defined in <ltype> and <rtype>. The data type ofthe

result ofthe operation is specified in <restype>. The argument <filename> is apathname of a

file in which the userprovides assignment statements thatmakeup the exact sequence inper

forming

the operation. Basically, a complexoperator on ADT's is broken down into simplier

steps that invoke known operators (either pre-defined or user-defined) to act on the

operands. Each assignment statement is in the form of "<identifier>

(29)

keywords

"LEFT, "RIGHT,

and _"RESULT are used to reference the _{left operand,} right

operand, and the resultrespectively. In _addition,

temporary identifiers, %1, %2,

.',

%9,

can

be used to store

intermediate

_{results of}_evaluation. _{The data}

types of these _{temporary iden}

tifiers do not have to be

defined;

they

take on the data type of the result of an expression

when

they

are

first

set The left hand side ofthe

last.assignment statement

inthe filemust be

"RESULT.

2.2.2. _{Relational Algebraic Commands}

The

following

commands allow the user to input design data into the data

base,

as well

as to _specify queries on the data base _using standard relational algebraic operators such as

project, view and join. A more detailed functional specification of each available command

follows.

The command

append <rel_name> <att_value>...

is used to add a new tuple to the relation <rel_name>. The number of attribute values

<att_yalue> mustequal the number of attributes defined for that relation. For each _attribute,

the <att_yalue> must conform to the expected external representation of the _{corresponding}

datatype.

The command

display

[relations|adts|operators|versions|<re!_name>I

either displays the content of an entire relation

by

the name <rel_name>, or displays informa

tion in the system relations.

"Display

relations"

shows all the relations defined in the DBMS

in the current context, together with a list of their attributes and data types.

"Display

adts"

(30)

ele-ments and their data types.

"Display

operators"

gives the definition of all the user-defined

operators in their textual

form,

together with other

information

about those operators such as

their _{precedence, cardinality,} _operands _{and result} _data _types.

"Display

versions"

provides a

summary of all the defined versions _currently recorded in the design data

base,

each

being

accompanied

by

the timewhen that versionwas started and atextualdescription of each.

The command

project <rel_name> [over_{<att_name>...]}

is similarto the command

display

_{but instead}of_showing all the _attributes, project only shows

thevalues ofthe attributes specified in the attribute list in the "over" clause. The order ofthe

columns in the output is specified in the "over" clause. Notice that the "over" clause is

optional. Ifthe "over" clause is _omitted, this command reverts to adisplay.

The command

delete

[from|

<rel_name> [where <condition>]

is used to delete tuples froma relation specified in <rel_name> which satisfied the condition

in the "where"

clause. The <condition> in the "where" clause is specified as an expression

which, when evaluated, should yield a result of the type INT (integer). If the optional

"where"

clause is left out, the entire relation will be removed in the current version of the

design. Ifatuple or arelation is carried over froma previous version or another alternative,

its deletion in the current version does not affect its existence in previous versions or other

alternatives.

The command

retrieve<att_list> from <rel_name> [where <condition>)

(31)

attributes, if

they

are defined as

ADT's)

specified in the <att_list> from the relation <rel_name>. The optional "where" _clause _limits _the _output _{to those tuples that}

satisfy the

<condition>.

The command

view <assignlist> from <rel_name>[where _{<expr_list>[}[into <new_rel_name>[

is a variant ofthe retrieve command which allows users to obtain an alternative view of the

data in the data base. Instead of

displaying

the retrieved data in the form that they are

stored, the user can _specify alist ofassignment statements

(<assignlist>)

to transformthe data

through some additional operations to give new _meaning to the retrieved data The optional

"where"

clause serves the usual function of

limiting

the output to certain tuples in the relation

<rel_name> that _satisfy the <condition>. The optional "into" clause instructs the DBMS to

generate a new relation

by

the name of <new_rel_name> to contain the result of the view

operation. The

facility

allows the user to

dynamically

create a new "view" ofthe data in the

design data base and treat itas _any other _existingrelations.

The command

join <rell_name> <rel2_name> on <att in rell> = <att In rel2>

performs an equi-join on the relations <rell_name> and <rel2_name>

by

_checking for

equivalence ofthe <attin rell> and <attin rel2>columnsin the two relations.

The command

update <rel> <assignlist> [where <condition>]

updates the values of attributes specified in the assignment list <assignlist> in the relation

<rel>. The new values are the result of evaluation of the right hand side of the assignment

statements in the <assignlist>. Ifthe optional

"where"

(32)

the relation will be _updated, otherwise _only those tuples that _satisfy the <condition> will be

updated. Ifthe affectedtuple is carried overfrom previous versions or another _alternative,a

duplicate tuple is made and the old tuple is marked obsolete while the actual update is per

formed

on the duplicatedtuple. The duplicatedtuple is _onlyvisible _starting from thecurrent

versionbutnot inolder versions.

2^3.

Versioning

Control

Commands

Alternatives

are defined to be different instances ofthe same design which are meantto

be worked onsimultaneously. Themaximum number ofdesign alternatives is fixed at compi

lation timeofthe DBMS.

Currently,

the maximumnumber ofalternativesis set at four.

Different versions of the same design are allowable in each alternative. A particular

version ofa design is built upon older versions (with the exception of version zero). When a

new version is

defined,

the state ofthe design data base in the previous version is frozen and

alterationsto the new version will _{only be}reflected inthe new version and will not affect the

state of affairs in the older versions. The maximum number of versions in each alternativeis

again pre-defined at compiletime.

Currently,

this value is set at sixteen.

A particular versionin _any alternative is identified

by

the combination ofits alternative

and version numbers in the form <alternative_no>.<version_no>. Maintenance of _versioning

information is handled

by

the DB Subsystem (refer to Chapter Three for adescription ofthe

DB

Subsystem)

and is transparentto the user as wellas the higher level modules. _{At any}one

point in time, one and _only one combination of alternative and version is active. The user

can _only see the data that is supposed to be visible in the current version. _{The only} _way of

viewingdata in another versionis to change the current context to thatversion.

The command

(33)

changes the _currently active alternative to <atl_no> to allow the designer to

freely

switch

between different alternativesofthe design.

The command

newalt

creates a new alternative in the design data base and carries over all relevant data in the

design data base that are _currently visible. The new version number will start at zero for the

new alternative and the new alternative will become the _currently active alternative.

The command

chvers <version_no>

changes the _currently"active version to <version_no> to reflect a different state ofthe design

data base.

The command

newvers

freezesthe current version and changes the contextto the next higher version.

The command

showversion

displays the currently active alternative and version numbers in the form

<alternative_no>.<version_no>.

2.2.4. Miscellaneous Commands

The

following

commands serve to provide the user with a more user-friendly environ

(34)

The command

setvar in <rel>

$<var_name>

= _{<expr> where <condition>}

allows the user to _{set a variable named <var_name>}

(arbitrary)

to the result ofthe evaluation

ofthe <expr>.

Basically,

_this _command _is _used _to _store _the _{result of}_a_{retrieve operation} _in _a

temporary

repository for

future

use (instead of_simply

displaying

it on the screen and then

throwing

it away). The data type ofthe variablewill _{be automatically}setto that ofthe result

ofthe expression <expr>. Once set, the content of a variable can be referenced and used in

later operations as an operand in expressions in"where"

clauses or assignment statements.

The command

showvar

[$<var_name>|

shows the value ofthe variable $<var_name> in its external representation. Ifno argument is

provided, the values of all the variables that are_previouslyset will be displayed.

The command

unsetvar$<var_name>

makes the variable $<var_name> extinct and frees the internal variable table _{entry originally}

occupied

by

that variable. That entry can then be reused for another variable. The internal

variable table size is currently set at 50.

The command

!!<commands>

appends <commands> to the lastexecuted command to form a new command to be executed

by

the system. If <commands> is null, the last executed command is executed again. This is

basically

provided to save typing on the

(35)

The command

!<commands>

invokes

a shell escape to the UNIX* shell where the <commands> are performed. This

allows the user to make the best use ofthe

facilities

provided

by

the UNIX* operating sys

tem.

The command

prints out a one page concise

help

manual thatsummaries all the available commands in this

DBMS.

Thecommand

< filename

instructs the system to read in command lines from the file

by

the name offilename. Each

line read in

by

the DBMS is echoed to the output and the user is prompted for a confirma

tion before the command is actuallyperformed. This

facility

allows the users to create com

mandfiles for

frequently

repeated operations.

The command

filename

is identical to the above command except that no confirmation is required from the user.

Command lines are read in and

directly

executed.

(36)

The command

> filename

writes out the last executed command to the file namedfilename. Ifthe file already exists,

the command is appended to the end of the

file;

otherwise the file is created and the com

(37)

ARCHITECTURAL

DESIGN

3.1. Introduction to DesignofResearch DBMS

This chapter starts with an overview ofthe structural design ofthe Data Base Manage

ment _{System (section 3.2). The DBMS is broken} _up _into _three _distinct_levels. _At _the _highest

level,

there is the user interface to interpret user input and queries. In the middle lies the

center of _control, the "DB

Manager",

who _{takes up the task} of

delegating

different tasks to differentsections in the organization. At the lowest

level,

The DB "Subsystem"

willinterface with the UNIX* file system to store and retrieve data in thedata base which is maintained in

secondary memory at all times.

Section 3.3 describes the interfaces between the DBMS and its environment. Section 3.4

showsthe internal dataand control flow ofthe whole system.

3.2. OverviewofResearch DBMS

Figure 3.1 shows an overview of the structure of the Data Base Management System.

The "User Interface" is responsible for parsing user's commands to _query the data base.

Early

error detection is attempted to check for obvious syntactical errors in the command

line. The "Command Line Parser" parses the command line up into tokens and passes it

down to the DB Manager for further processing.

The "DB Manager" examines the parsed command line to determine which ofits agents

should be called to handle the request. For example, the "Operator Manager"

maintains

UNIX isatrademarkofAT&T BellLaboratories.

(38)

USER

INTERFACE

;

OUTPUT

\

MANAGER

V

COMMAND

LINE

PARSER

DB

MANAGER

;'bPE&W;TYE^

MANAGER

<

^MANAGER

:>.

MANAGER

[

l\

\f

DB

SUBSYSTEM

I

RECORD

;

j

RELATION

'MAKIA/aCD '

: _MAKIA^rp

'I i# V \SJ*mT\* , I MlIWLl\

RECORD

FILE

RELATION

^

(

RECORD

^

(

RELATION

\

(

REL

AT I

J

I

FILE

J

I

FILE

J

\

FILE

(39)

information

pertinent to basic operators _{such as addition and} _{multiplication,} _as _well _as

user-defined operations defined on _{Abstract Data Types. This} _particular _agent _is _responsible _for

the

identification

andevaluation ofbasic pre-defined operators as well as user-definedopera

tors on ADT's. The "Version Manager"

keeps track ofthe current alternative and version of

the design data base and _determines which parts of the design data base should be visible in

the current context The "Type Manager"

is responsible for

identifying

both

basic,

prede

fined

data types as well as user-defined Abstract Data Types. The "Expression Manager"

provides support _{for parsing} and evaluationofmathematical expressions.

Information pertaining to the request is gathered as needed

by

the "DB Manager"

from

its agents. Control is then passed down to the "DB Subsystem"

to extract the "raw" data from

the relations. The "DB Manager"

then invokes the appropriate utilities under its control to

examine and format the "raw" data. If the data matches the conditions set forth

by

the

request, it is sentto the "OutputManager"

and queuedfor output.

The "DB Subsystem"

handles all the lowest level data storage and retrieval

by directly

interfacing

withthe UNIX* file system. Storage format and retrieval methods are madetran

sparent to the higher level routines so _they can view the data at a more abstract level. Design

data version control is alsohandled

by

the "DB Subsystem"

and its work in this respectis vir

tually

invisibleto the higher levelmanagers.

Detail description of various sectors ofthe DBMS willbe presented inChapter Four.

33. User Interfaces

This DBMS is command driven. The user can interface with the DBMS

by

_entering

on-line commands, or may choose to group commands in a file and execute them as a batch.

A "quick reference"

manual is included to assist the user in his/her communication with the

DBMS.

Furthermore,

access to the the UNIX* _operating system is provided. Helpful

(40)

UNIX*

features

include shell escapes and the _ability ofthe DBMS to read commandsfroma

file or _storing user _query commands in a _{file for future} use. Chapter Two contains detailed

functional

specificationsof all theavailablecommands supported

by

the DBMS.

In order to support the definition ofADT's upon _{previously defined}

ADT's,

the notion

oflist and sublist is

introduced

to the command parser.

Normally,

upon input ofdata from

the user

terminal,

the space character is considered to be the field separator.

However,

the

user can override this

by

_enclosing_{any string} ofcharacters within _{matching curly brackets}

(

{

,

}

). The command line parser _simply parses each command line into tokens or lists of

tokens and passes them backto the "DB Manager".

Curly

brackets in theinput can benested

to form sublists within a list. Each time a _string is parsed, the outmost pair of _matching

brackets will be stripped to reveal the list inside them, so that sublists are broken up only

when_they need to

be,

butnot before that.

This notation for the representation oflists and sublists is maintained across the input

and output boundaries of the DBMS. Data items that represent ADT's are enclosed within

curlybrackets when _they are presented to the user.

All relations are stored as individual files in the UNIX* file system. The DBMS inter

faces with the UNIX* file system through the "DB

Subsystem",

which is the only module in

the whole DBMS that has direct access to the "raw" data stored in the data files. The "DB

Subsystem"

interfaces with the UNIX* file system through low level system calls such as

"seek",

"read" and "write". This restriction of direct access to the "raw" data to _only one

module makes this interface much cleaner and hides the details of version control from the

higherlevel modules.

(41)

3.4. Data and Control Flow

The "DB Manager"

communicates with its agents and the "DB Subsystem"

mainly

through global data structures which have

fixed formats.

These buffers are passed through

the_{whole system and are modified as} _appropriate.

For example, a particular relation comes into view when its relation buffer

(_RELA-TIONBUF)

is created

by

the "DB Manager". Information aboutthatrelationis thengathered

by

different managers as this buffer passes through their departments. For example, a

_RELATIONBUFstructure is defined as follows:

typedefstruct

{

char _{rel_name[R_NAME_LEN];}

char _{rel_fname[F_NAME_LEN];}

int _{rel_fd;}

int _curindex; int _{next_free;} int _{empty_records;} int _tuplesize;

char

*data;

}

_RELATIONBUF;

The "DB Manager"

first allocates _{dynamic memory} for this data structure and fills in the

rel_name

field,

which contains the logical name ofthe relation. The structure is then passed

down to the "DB Subsystem" to look for a relation

by

that name. If a relation exists

by

that

name, the actual file name under the UNIX* file system that is used to store this relation is

placed in rel_fname. The field tuplesize will then take on a value which is the number of

bytes used for the internal representation of one tuple in that relation. This

"relationbuf

is

then passed back tothe "DB Manager".

Dynamic memory is then allocated for the data field

by

the "DB Manager". This data

element in the structure is used as a repository for receiving data from the "DB Subsystem"

when datafrom that relation isactuallyretrieved fromthe relation file. When the relationfile

is opened

by

the function call"openrelfile", the rest ofthe information in the datastructure is

(42)

filled in

by

the appropriate

functions

in the relation package. The "curindex"

is used to

keep

track ofthe position ofthe current tuple in that relation. The internal

function,

'getnexttu-ple", uses this

"relationbuf

structure to trasport the data retrieved from the next physical

tuple ina relation. The "nextjree"

and "_emptyjrecords"

fields are used _only

by

the "DB Sub

system"

for garbage collection purposes and are _usually not _the concern of higher level

modules. Upon the completion of a _transaction, thememory claimed

by

these buffers can be

freed and reused. Appendix A contains a complete list of all the system buffer structures

used globallyin the DBMS.

A relation has to _{be explicitly} opened

by

the openrelfile function call before tuples can

be retrieved from that relation.

However,

there are certain system-generated and

system-maintained relation/record files that are opened when the system is

initially

started up and

are kept opened until the program is terminated. The

following

is a list of these system

relation/recordfilesand theircontent:

RELATIONS - User-defined

relations

ATTRIBUTES - Attribute lists for_each _{user-defined relation}

ADTS - User-defined Abstract Data Types

ADTMEMS - Lists _{of members and} theirdatatypes for eachADT defined

TEXTRECORD - Record file for Textual information

VERSIONS - Alternatives _{and versions} defined

ADTOPS - Operators defined _onADT's

by

_users

ADTOPATTS - Attribute lists foreach ADToperator

ADTOPSTATS - Statement listsfor _each ADT operator

The main purpose for maintaining these relation/recordfiles opened at all times is toimprove

the performance of the DBMS. All "raw" data of user-defined _relations, as well as

system-generated and system-maintained relation/record files are _{kept in secondary memory} and

(43)

MODULAR

DESIGN

OF DBMS

4.1. Introduction to Modular Design ofDBMS

This chapter presents the DBMS detail designand describes the implementationof each

module. The "DB Manager"

has _{already been} introduced as the center of data and control

flow in the DBMS (Chapter

Three)

and will not be considered here. Section 4.2 explains in

more details the major parts of the "DB Subsystem"

and their functions. Section 4.3 details

the implementation in this DBMS for _{supporting ADT's} and their operations. Identification

and

handling

of different data types in the data base is explained in section 4.4. Section4.5

describes the mechanism for _{parsing, storing} and evaluation of expressions such as those in

"where"

clauses and assignment statements.

Finally,

the function ofthe "OutputManager" will

be introduced in section4.6.

4.2. Data Base Subsystem

The "DB Subsystem" is the lowest level module thathas soleaccess to the "raw" data. Its

major components are broken down intothe

following

subsections.

4.2.1. Relation Package

There are only two formats ofdatastorage visibleto the data base designer. The first is

relations, whichs consist ofunordered, fix-sized tuples. The second is records, which is made

up of_any number ofordered, fixed-sized data blocks. This will be described in moredetails

insubsection 4.2.2.

The relation package is responsible for the storage and retrieval of data stored in the

(44)

HU'I'JFBK 4 niPTT_aBC_conifr 6

alt 1 alt 2 DATA.

11111111 11000000 data of tuple 1

liilllll llllllll data of tuple 2

llllllll 11100000 data ot tuple 3

OOOOOOOO oooooooo eapty blk l

oooooooo oooooooo

aapty-blk 2

oooooooo oooooooo opty blk 3

oooooooo oooooooo oapty blk 4

oooooooo 111X1111 data at tuple 4

11110000 llllllll data of tuple E

oooooooo 11111000 data of tuple 6

oooooooo oooooooo snpty blk 6

oooooooo oooooooo _{eaptr blk} 6

oooooooo ooooooo