A Collection Model for Data Management in Object-Oriented Systems

181  Download (0)

Full text

(1)

A C o lle c tio n M o d e l fo r

D a t a M a n a g e m e n t in

O b j e c t - O r ie n t e d S y s te m s

M o i r a C. N o r r i e

A tli('sis s u b m itte d for I he degree o( D o c to r o f P hilosophy,

to the

F aculty of Science, U n iv e rs ity o f Glasgow

Decem ber 1992.

(2)

ProQuest Number: 13818579

All rights reserved

INFORMATION TO ALL USERS

The qu ality of this repro d u ctio n is d e p e n d e n t upon the q u ality of the copy subm itted.

In the unlikely e v e n t that the a u th o r did not send a c o m p le te m anuscript and there are missing pages, these will be note d . Also, if m aterial had to be rem oved,

a n o te will in d ica te the deletion.

uest

ProQuest 13818579

Published by ProQuest LLC(2018). C op yrig ht of the Dissertation is held by the Author.

All rights reserved.

This work is protected against unauthorized copying under Title 17, United States C o d e M icroform Edition © ProQuest LLC.

ProQuest LLC.

789 East Eisenhower Parkway P.O. Box 1346

(3)

i

(4)

A b s t r a c t

T h is thesis addresses the question o f how to p ro v id e d a ta m anagem ent services in o b je c t-o rie n te d system s w ith re lia b le p ersistent o b je c t stores. I t proposes a.1 1 o b je c t

d a ta m odel, called the c o lle ctio n m odel, w h ich serves as a fo u n d a tio n fo r the con­ s tru c tio n o f such services. T h e c o lle c tio n m odel is general in th a t it is in dependent of any p a rtic u la r im p le m e n ta tio n p la tfo rm . In p a r t, th is independence is achieved th ro u g h the separation o f the d a ta m odel fro m the u n d e rly in g ty p e m odel.

T h e re are tw o com ponents o f the c o lle c tio n m odel - a s tru c tu ra l m o d el, B R O O M , and an o p e ra tio n a l m odel based on an algebra o f co lle ction s. T h e s tru c tu ra l m odel is s e m a n tic a lly ric h and e x h ib its p ro pe rtie s o f b o th th e e n tity -re la tio n s h ip and sem antic d a ta models. U nary collections are used to represent e n tity categories and b in a ry collections to represent re la tion sh ip s between e n titie s . C la ssifica tio n s tru c tu re s are based on the notion o f a. c o lle c tio n fa m ily w h ich represents various form s o f conceptual dependencies am ong the colle ction s o f a fa m ily .

T h e requirem ents for s u p p o rtin g the various form s o f e v o lu tio n in o b je c t-o rie n te d database system s are presented. A n extension to the c o lle c tio n m odel is proposed to s u p p o rt o b je c t e v o lu tio n w hereby o b je cts can m ig ra te w ith in cla ssifica tion stru cture s. T w o e x is tin g realisations o f the co lle ction m odel are described. One is a p ro to ty p e , single-user system im p le m e n te d in P rolog. T h e o th e r form s the basis o f the O b je c t D a ta M anagem ent Services o f the Coma.ndos p la tfo rm for d is trib u te d , o b je c t-o rie n te d a pp lica tio ns.

(5)

A c k n o w l e d g e m e n t s

M uch o f the w ork re p orte d in th is thesis form s p a rt o f the E s p rit p ro je c t 2071, Com andos. I acknow ledge the c o n trib u tio n s o f the rest o f the C om andos p ro je c t g ro up a t G lasgow, p a rtic u la rly those o f the O D M S group: the y are Steve B lo tt, C o lin D u n lo p , D avid H a rp er, A rk a d i k o s m y n in and A n d re w W alker. Special tha n ks are due to D avid H a rp e r for ta k in g on the re s p o n s ib ility for the O D M S g ro u p and the supervision o f th is thesis. His c o n trib u tio n s to the' design o f the c o lle c tio n m odel were invaluable. The im p le m e n ta tio n of the O D M S was und e rtake n by Steve B lo tt and A n d re w W a lke r and they, along w ith D avid, should take the c re d it fo r tu r n in g a dream in to re a lity. C olin D unlop and A rk a d i K osm yn in were responsible fo r the developm ent ofdataba.se tools for use w ith t he collect ion m odel. F u rth e r th a n k s are due to St eve lor his com men Is on d ra il s of this 11 insis and to C olin for his considerable help wit li I lie diagram s.

(6)

C o n te n ts

11 it ro d n e t io n

I . l 1 )atabase Svst em Requirem ent s ...

I . I . I E ffectual Data M a n a g e m e n t... 1.1.2 Expedient Data M a n a g e m e n t ... 1.1.3 E ffic ie n t D a ta M a n a g e m e n t ... 1.2 T he C o lle c tio n M o d e l...

1.3 S tru c tu re o f T h e s i s ...

F o u n d a tio n s o f D a ta M o d e ls

2.1 P hilosophical F o u n d a t io n s ...

2.2 A (le n e ra l Fram ew ork ...

2.3 1 )al a M o d e ls ...

2.4 O b je c t-O rie n te d Database M anagem ent System s .

T h e S t r u c t u r a l M o d e l: B R O O M *

3.1 A n O ve rvie w o f B R O O M ... 3.2 ( 'o ile d ions and C o lle c tio n F a m i l i e s ...

3. 2. I ( ' o l l e r l l ol l t i l II I ll I CS . . . .

ii i

1

1

3 6

11

14

17

19

20

25 29

35

40

42

49

(7)

C O N T I M T S

iv

3.3 A M e ta c irc u la r D e scrip tio n o f B R O O M ... 55

3.4 A Z S pecification o f B R O O M ... 64

1 S e m a n tic M o d e ll in g in B R O O M 80 4.1 R e la t io n s h ip s ... 82

4.2 A gg re g atio n , A ssociation and G e n e r a lis a tio n ... 87

5 T h e O p e r a t io n a l M o d e l 97 5.1 O p e ra tio n a l L e v e l s ... 98

5.2 A C o lle ctio n A l g e b r a ... 102

5.2.1 O pera tio ns on C o lle c t io n s ... 104

5.2.2 O pe ra tio ns on B in a ry C o lle c t io n s ... 107

5.2.3 O pe ra tio ns on O rdered C o l l e c t i o n s ... 109

5.2.4 O pera tio ns on S e t s ... 110

5.2.5 O pe ra tio ns on B a g s ... I l l 5.2.6 S pe cifyin g C o lle c tio n O p e ra tio ns in Z ... 113

5.3 E xam ple Q u e r i e s ... 115

5.4 P roperties of the C o lle c tio n A lg e b r a ... 119

6 E v o lu t io n 123 6.1 O bject ( ’real ion and D e l e t i o n ... 124

6.2 O L )eel E v o l1 11 u n i ... 133

(8)

C O N T E N T S v

7 R e a lis in g th e C o lle c t io n M o d e l 144

7.1 C O L L E E N ... 145 7.2 T h e Com andos O D M S ... 149

8 C o n c lu s io n s 155

8.1 C o n t r ib u t io n s ... 155

(9)

C h ap ter 1

In tro d u ctio n

W ith the advent o f persistent p ro g ra m m in g system s and p e rsiste n t o b je c t stores, some o f the fu n ctio n a .iity o f a database m anagem ent system (D B M S ) is p ro vide d by a general persisten t system . We are forced to ask ourselves questions such as: “ W h a t fu n c tio n a lity o f a D B M S is n o t su p p o rte d by a p e rsiste n t store?’’', “ W h a t distinguishes a. database system fro m a p e rsiste n t system ?” .

To answer these questions, we m u st firs t e xam ine th e basic re q u ire m e n ts o f a database system . T h is enables us to e stablish a clear c h a ra c te ris a tio n o f b o th database system s and p ersistent system s and the re b y assert th e a d d itio n a l fu n c tio n a lity re q u ire d o f a database m anagem ent system . We ca.n the n address th e issue o f how to p ro v id e th is fu n c tio n a lity o f a database m anagem ent system in a.n o b je c t-o rie n te d system w itlb a

persistent o b je c t store.

1.1

D a t a b a s e S y s t e m R e q u i r e m e n t s

A database system is a. softw are system w hich su pp o rts a d a ta -in te n s iv e a p p lic a tio n . In the 60 \s and 70’s, such a p p lic a tio n s tended to be re la tiv e ly sim p le and s tra ig h tfo r­ ward in th a t the s tru c tu re and use o f d a ta was re g ula r and w e ll-k n o w n . For exam ple, database system s were developed for stock c o n tro l, p a y ro ll and p a tie n t record a p p li­ cations.

(10)

C l IA P T E l i I. I N T R O D U C T I O N 2

A id e d Design and C o m p u te r Assisted S oftw are E ngineering. These a p p lic a tio n sys­ tem s m odel co m p lex hum an processing system s and s u p p o rt a w ide range o f a c tiv itie s . For exam ple , an O ffice In fo rm a tio n S ystem encompasses s u p p o rt for a ll office a c tiv ­ itie s ra th e r th a n s u p p o rtin g a single a c tiv ity such as the m ain ten a nce o f personnel records.

T o p ro v id e c o m p u te r s u p p o rt fo r an a c tiv ity , we m u st firs t be able to describe th a t a c tiv it y and, in th e case o f co m p lex h um an a c tiv itie s such as those o f designing a sh ip or p ro d u c in g a. large softw are system , it is an in tric a te task to p roduce such a d e s c rip tio n . O ne o f the aim s o f the designers o f a database m anagem ent system is to p ro v id e the database a p p lic a tio n system designer w ith a set o f concepts th a t assist the construct ion o f such an a p p lic a tio n m odel.

T h e m ain c h a ra c te ris tic o f all d a ta -in te n s iv e a p p lic a tio n s is the re p re se n ta tio n o f a large n u m b e r o f in te rre la te d e n titie s o f the a p p lic a tio n d om ain . Each e n tity w ill be represented in th e database system by a. value o r an association o f values. Here, the. te rm ‘ va lu e ’ is used in its m ost general sense to mean any v a lid data, ite m o f a. p a rtic u la r system . A value m ay be a sim p le value, such as the integer 3 o r the s trin g jo h n . or a co m p lex record or o b je c t value.

Instead o f an e n tity being represented d ire c tly by a single value, it m ay be the case th a t il is represented by an association o f values. For exam ple, a. p a rtic u la r person m ig h t be represented by the association o f the s trin g value jo h n and the s trin g value 24 H ig h S t r e e t . A ssociations o f values are also used to represent re la tio n s h ip s be­ tween e n titie s . I f a p a rtic u la r person e n tity is represented by th e value p, and a p a r tic u la r d e p a rtm e n t e n tity is represented by th e value d, the n th e fa c t o f th a t person being e m p loye d by th a t d e p a rtm e n t m ay be represented by the association

( p , d ) .

T h e fo rm o f re p re se n ta tio n o f an e n tity , or a re la tio n s h ip between e n titie s , w ill depend on the form s o f values and associations o f values su pp o rted in a. p a rtic u la r system . A n o b je c t-o rie n te d system can represent e n titie s d ire c tly by means o f o b je c t values whereas a re la tio n a l data.ba.se system o n ly su pp o rts sim p le values and g e n e ra lly e n ti­ ties o f the a p p lic a tio n dom ain have to be represented th ro u g h associations o f values held a.s tup le s o f rela tion s. F u rth e r, fo r a given system , the database designer m ay have a choice' as to the' form o f representation o f p a rtic u la r e n titie s and re la tion sh ip s.

It follow s th a t a database system is concerned w ith the m anagem ent o f a. large n u m b e r o f value's and associations between values. 'The general re q uirem en ts o f such a. system can be expressed in term s of ’ I he lo u r e V . II a. database system is l.o be ejjeclive, the m anagem ent o f values must be ejJectiuiL expedient and efjicieiit.

(11)

C H A P T E R

/.

I N T R O D U C T I O N

3

system w h e th e r th a t user be a. database a d m in is tra to r, a p p lic a tio n p ro g ra m m e r or end-user. For a system to be effective, it m u s t also be e ffic ie n t in th a t o perations m u st be perform ed w ith in an acceptable tim e p e rio d .

N ote th a t these three requirem ents o f database system s are n o t in d e p e n d e n t. F u rth e r, some features m ay c o n trib u te to m ore than one re q u ire m e n t. For e xam ple , s u p p o rt for d a ta independence w hich divorces the lo g ical and p hysical d e scrip tio n s o f d a ta ha.s benefits in term s o f both user convenience and efficiency o f b o th system o perations and p ro g ra m m e r p ro d u c tiv ity .

Fa.ch o f these three requirem ents is now exam ined in some d e ta il.

1.1.1

E ffectu a l D a t a M a n a g e m e n t

I f a. da.ta.base system is to sa tisfy its in ten d ed purpose, i t is fu n d a m e n ta l th a t it ensures the v a lid ity o f the d a ta values th a t i t manages. T h is means th a t the system m ust p ro te ct the data against loss or c o rru p tio n due e ith e r to fa ilu re o r user abuse. F u rth e rm o re , the data, values m ust be c u rre n t in th a t the database m u st reflect the effects o f successfully com pleted operations on the da.ta.ba.se. T h u s, there m ust be some form o f lo n g -te rm , re lia b le storage o f d a ta values in o rd e r th a t the database w ill persist between a p p lic a tio n p ro gram executions and beyond m a ch in e shutdow ns or failures.

For values to persist beyond p ro gram e xecu tio n, the re m u st be some fo rm o f p ersistent address space. In conventional system s, the lo g ical u n it o f persistence supported by the o p e ra tin g system is the file a.nd the persistent address space is c o n tro lle d by the file manager.

Most, general purpose p ro g ra m m in g language system s adopted th is basic m odel o f persistence by h a vin g a. single fo rm o f p ersisten t value - the file . U su a lly, a hie corresponds to a. sequence o f u n ifo rm ly fo rm a tte d records. I t was recognised th a t there are tw o m a jo r problem s w ith such a system . F ir s tly , a ll d a ta values w h ich the p ro g ra m m e r wishes to persist m ust be converted to an a p p ro p ria te file fo rm a t. Secondly, since th e persisten t d a ta is e ffe c tiv e ly stored in raw fo rm a t, there are no m echanism s to ensure th a t the d a ta is used ‘ sa fe ly’ .

To overcom e these problem s, there arose the notion o f a p ersisten t p ro g ra m m in g language in w hich any form ol data value could persist. T w o o f the firs t persis­ te n t p ro g ra m m in g languages were developed in d e p e n d e n tly in the F ast and West by Z a m u lin [Z a m 7 T Zam 78, Zamdd] and A tk in s o n [A G C S l, A B G + 83]. respectively.

(12)

C H A I ’ TVH I. I N T I I O D U C T I O N 4

file s tru c tu re s and vice versa. Hence, the use o f p ersistent p ro g ra m m in g languages fo r d a ta -in te n s iv e a p p lic a tio n s reduced b o th p ro g ra m m in g e ffo rt and p ro g ra m code. F u rth e r, since these system s store ty p e in fo rm a tio n along w ith the d a ta values in the p e rs is te n t store, th e y p ro v id e s u p p o rt fo r e nsu rin g the safety o f d a ta use, i.e. th a t i t is used in a c o rre c t and m e a n in g fu l way.

To ensure the resilience o f the system to m a ch ine failu res, the p ersisten t values m ust be held on n o n -v o la tile storage. M a g n e tic disks are the m ost w id e ly used n o n -v o la tile m e d iu m for the storage o f large q u a n titie s o f c u rre n t, u p d a ta b le d a ta values. We d is tin g u is h c u rre n t data, values from a rchive o r back-up data, w hich m ay be held on m a g n e tic lape. A lso, wo d is tin g u is h between databases w hich may be updated fro m those t ha 1 are a read-only in fo rm a tio n store and may be held on a read-only m edium such as C l) R O M . W'c note I hat wit h the deve lo p m en t o l system s w ith large non­ v o la tile H A M . there arc a n um be r o f research pro jects in v e s tig a tin g m a in m em ory database system s in w hich all persistent va.lues are held in R A M .

'Therefore, ty p ic a lly the p ersisten t values o f a. data.-intensive a p p lic a tio n system w ill be held on m a g n e tic disk. T h e persistent va.lues associated w ith an a p p lic a tio n system are te rm e d , c o lle c tiv e ly , the database o f the system . In some p ersisten t p ro g ra m m in g languages, i t is possible to access m ore th a n one database in a p ro g ra m and therefore the p ersisten t va.lues o f an a p p lic a tio n system m ay be stored as one or m ore databases. Those p ersisten t values accessed by an a p p lic a tio n w ill be m apped in to the p ro g ra m ’s v ir tu a l address space as re q uired .

As stated p re vio u sly, the system m ust ensure th a t the effects o f successfully com ­ ple te d o p e ra tio n s are reflected in the database. O n the o th e r hand, the system m ust p re v e n t o p e ra tio n s w h ich fa il to co m p lete successfully fro m leaving th e database in an in c o n s is te n t state. I t is therefore necessary to in tro d u c e the n o tio n o f a logical u n it o f processing - the tra n s a c tio n . A tra n sa ctio n is an a to m ic o p e ra tio n on the database in th a t e ith e r it com pletes successfully a.nd a ll o f its effects a.re reflected in the databa.se or it (ails to co m p le te a.nd none o( its effect,s a.re, reflected in the database.

A successful tra n s a c tio n com pletes w ith a. co m m it o pe ra tio n . A ll persistent values w hich have been created or u pdated in the tra n s a c tio n ’s v irtu a l address space w ill be m apped in to the p ersistent address space. I f the m achine were to fa il w h ile the persistent, va.lues are being w ritte n to the n o n -v o la tile storage, some o f these updates could be lost and th e database le ft in an in c o n s is te n t state. T o pre ven t th is , a log o f th e tra n s a c tio n ’s processing and outcom e is m a in ta in e d and, at the p o in t o f c o m m it, th is log is w ritte n onto stable storage. In the event o f m achine fa ilu re before the updates are w ritte n to the database, e x a m in a tio n o f the log record d u rin g recovery shows th a t 1 he decision was taken to c o m m it a.nd the new database state can be const ructed from t he in fo rm a t ion in the log.

(13)

C H A P T E R . I. I N T R O D U C T I O N

the tra n s a c tio n ’s v irtu a l address space w ill n ot be m apped in to th e p e rsiste n t address space. In th is way, the database is unaffected by any processing p e rfo rm e d by the tra n s a c tio n before a b o rtio n .

One o f the e a rly m o tiv a tio n s for database system s was the se pa ra tio n o f d a ta fro m program s, the re b y a llo w in g a ce n tra l d a ta re p o s ito ry w h ic h could be shared by a n u m b e r o f a p p lic a tio n program s. T h is n o tio n can be generalised to th e sepa ra tio n o f know ledge fro m th e a p p lic a tio n o f know ledge. T h u s, the re can be a shared re p o s ito ry o f know ledge a bo u t some a p p lic a tio n d o m a in and a p a rtic u la r a p p lic a tio n p ro g ra m can u tilis e w h ichever parts o f th is know ledge re p o s ito ry i t requires and a p p ly th is know ledge in w ha te ve r w ay it chooses.

T h is generalisation is in tro d u ce d here to emphasise th a t the same p rin cip le s o f per­ sistence a p p ly w h atever form s o f values m ay persist. T h us, some p e rsiste n t p ro g ra m ­ m in g languages, such a,s PS-algol [A C C S l], [A B C + S3] and N a p ie r [M B C D 8 9 ], have in tro d u c e d procedures as first-class o b je cts and one fo rm o f p ers is te n t value is the procedure. S im ila rly , a d e d u ctive database system is based on the logic p ro g ra m ­ m in g p aradigm in which know ledge a bo u t the a p p lic a tio n dom ain is recorded as a set o f fa d s and a set o f general rules. T h erefo re , the p e rsiste n t store o f a d ed u ctive database system m ust store both values w hich are facts and values w h ich are rules. In o b je c t-o rie n te d systems, an o b je c t has associated m ethods and the im p le m e n ta tio n s o f these m ethods must also persist.

W h a te v e r the form s o f p ersisten t values su p p o rte d by a system , it m ay be a re­ q u ire m e n t o f th e system th a t these values can be shared by a n u m b e r o f a p p lic a tio n program s. T h e re m u st be some m echanism to ensure th a t tw o o r m ore a p p lic a tio n program s accessing the same p ersisten t store do n o t in te rfe re w ith each o th e r’s op­ e ra tio n . A n a p p lic a tio n p ro g ra m m ay in c o rp o ra te one o r m ore tra n s a c tio n s where a tra n s a c tio n is an a to m ic o p e ra tio n on th e database. Hence, w h a t is re a lly re q uired is some fo rm o f co n tro l to p re ven t c o n flic t am ong co n c u rre n t tra n s a c tio n s w h ich op­ erate on the same persistent store. T h is mea.ns th a t the effects on the database a.nd the o u tp u t o f the tra nsa ctio ns m ust be the same as i f these tra n s a c tio n s had ru n in iso la tion .

T h e system m ust su p p o rt some form o f co ncu rren cy c o n tro l m echanism w h ich e ith e r re stricts access to the persistent va.lues o r re s tric ts the c o m m it o p e ra tio n o f transac­ tions in such a way th a t the effect o f processing a set o f tra n sa ctio n s c o n c u rre n tly is e q u iva le nt to (he effect o f h aving processed these tra n sa ctio n s one a fte r the o th e r in some specified order, i.e. the tra nsa ctio ns are serializable.

(14)

CHAPTER. 1. I N T R O D U C T I O N

6

when it arises and recover fro m it by the ra th e r co stly process o f a b o rtin g tra nsa ctio ns. T h is can be achieved by h aving tra n sa ctio n s operate on th e ir own local copies o f persistent values and then checking before the c o m m it o p e ra tio n i f th e ir o p e ra tio n is in c o n flic t w ith th a t o f any tra n sa ctio n s th a t have c o m m itte d d u rin g th a t tra n s a c tio n ’s processing. In the case o f c o n flic t, the c o m m it o p e ra tio n is n ot allow ed to proceed. In general, the p essim istic approach is preferred where the lik e lih o o d o f c o n flic t is high and th e o p tim is tic approach is preferred where the lik e lih o o d o f c o n flic t is low. In su m m a ry, effectual d a ta m anagem ent requires some fo rm o f p e rsiste n t store and tra nsa ctio n m anagem ent. T h e tra n s a c tio n m anagem ent should at least p ro v id e some form o f recovery mechanism and, if the p ersistent store is to be sharable, then it m ust also s u p p o rt some form o f co ncurrency c o n tro l m echanism . M a n y va ria nts o f m echanism s for recovery and co ncurrency c o n tro l have been proposed and here o n ly a very b rie f d e scrip tio n o f th e ir re q uirem en ts ha.s been p ro vid e d . A good com prehensive d e scrip tio n o f the various issues and proposed m echanism s is g iven in the book by B e rnste in , hladzilacos and G oodm an [BHG S7].

In a d d itio n to the above, there should also be se c u rity m echanism s to pre ven t unau­ tho rised access to data. T h e tw o basic form s o f access c o n tro l m echanism s are lis t- based or token-based. List-based m echanism s associate w ith a p e rsiste n t value (or group o f values), a. lis t o f authorised users and th e ir access privileges. A token-based system p e rm its access to a p ersistent value (or group o f values) by users w ith the a p p ro p ria te token. T h e second kin d o f access c o n tro l is the basis o f th e c a p a b ility systems [LS7G].

D is trib u te d system s require fu rth e r extensions to the p e rsiste n t store and tra n s a c tio n m echanism s. T h e persisten t address space spans several nodes and a tra n s a c tio n m ig h t access persistent values across the nodes o f th is address space. P ersistent va.lues m ig h t be replica,ted at d iffe re n t nodes to increase the levels o f lo c a lity o f access and a v a ila b ility o f data. C onsequently, the recovery, co n cu rre n cy c o n tro l and se cu rity m echanism s have to be extended to take th is in to account. For e xa m p le , some va ria n t o f the two-pha.se c o m m it p ro to c o l m ig h t be used to ensure th a t the effects o f a, tra n sa ctio n are c o m m itte d a.t a ll p a rtic ip a tin g nodes o r none o f th e p a rtic ip a tin g nodes.

1.1.2

E x p e d ie n t D a t a M a n a g e m e n t

(15)

C H A P T E R I. I N T R O D U C T I O N 7

To assist the users in th e ir m aintenance and re trie v a l a c tiv itie s , i t is v it a l t h a t th e y be presented w ith a conceptual m o d el o f the a p p lic a tio n re a lity . T h is c o n c e p tu a l m odel is a m etalevel d e scrip tio n in term s o f the general concepts o f in te re s t in th e a p p lic a tio n d om ain . I t is the task o f the da.ta.ba.se designer to c o n s tru c t the co nce p tu a l m odel.

T h e set o f general concepts w ill be based upon the re c o g n itio n o f the e xistence o f s im ila ritie s am ong e n titie s and th e ir associations. A n e n tity lias a set o f pro pe rtie s each o f which m ay be e ith e r structura.I or o p e ra tio n a l. From now on, w hen we refer to the p ro pe rtie s o f an e n tity , it is assumed th a t we refer o n ly to those p ro p e rtie s o f in te re s t in a p a rtic u la r a p p lic a tio n system ra th e r th a n a ll p ro p e rtie s e x h ib ite d by the e n tity . A s tru c tu ra l p ro p e rty is referred to as an a ttr ib u te o f the e n tity a.nd it has an associated value. For exam ple, an e n tity m ig h t have a. name a ttr ib u te w ith the value jo h n . H ie set o f s tru c tu ra l properties, or a ttrib u te s , o f an e n tity d e te rm in e the form o f the e n tity . An operationa.I p ro p e rty specifies an o p e ra tio n th a t th e e n tity can perfo rm . The' set o f o pe ra tio na l properties o f an e n tity characterises its b e h a vio ur.

I f a. set of e n titie s have s im ila r fo rm and b ehaviour, the n a general d e s c rip tio n o f the p ro pe rtie s o f these e n titie s m ay be in tro d u c e d and these e n titie s w ill have a com m on repre sen tatio n in the database. In th is way, th e re is a m ove fro m th e p a rtic u la r to the general and the in tro d u c tio n o f a m e taleve l d e s c rip tio n o f the re p re se n ta tio n o f these e n titie s referred to as a type. A ty p e gives a general d e s c rip tio n o f a, value in term s o f the prope rtie s th a t value m u st hold and we say th a t a value w ith those p ro p e rtie s is an instance o f th a t type.

A system m ay p ro vide up to fo u r basic kin d s o f types. F irs tly , the system m a y sup­ p o rt a num be r o f p rim itiv e types fo r w hich the s tru c tu ra l and o p e ra tio n a l p ro p e rtie s are predefined. For exam ple, c o m m o n ly the p r im itiv e types in t e g e r , and b o o le a n are available. Secondly, the system m ay s u p p o rt a n u m b e r o f s tru c tu re d types fo r w hich the o pe ra tio na l prope rtie s are predefined b u t the structura.I p ro p e rtie s are not. For exam ple, record and enum erated types a.re s tru c tu re d types; the y have fixed op­ erations such a.s selectors on records - b u t the s tru c tu re o f the records is specified by the user. T h ird ly , the system m ay s u p p o rt operationa.I types in w hich the s tru c tu re is fixed and o pe ra tio na l p roperties a.re specified by the user. E xam ples o f these a.re procedure or fu n ctio n types. F o u rth ly , the system m ay s u p p o rt a b stra ct types for w hich both s tru c tu ra l and o p e ra tio n a l p ro p e rtie s have to be specified by th e user. These are a b stra ct data, types o r o b je c t types.

N ote th a t just, as an e n tity m ay be a p a rtic u la r o f several general concepts, its rep­ resentation value m ay have pro pe rtie s a d d itio n a l to those o f a given typ e and it is therefore possible for a value to be considered a,s an instance o f m any types - p ro vide d th a t it has the p roperties o f each o f those types.

(16)

CHAR TU R I. I N T R O D U C T I O N

8

o p e ra tio n s on e n titie s . A n o p e ra tio n is e x te rn a l to an e n tity i f i t is n o t a p ro p e rty o f th a t e n tity b u t ra th e r a p ro p e rty o f an encom passing e n tity . For e xam ple , in a lib ra ry system the o p e ra tio n o f b o rro w in g a book m ay be view ed - n o t as an o p e ra tio n o f a. b o rro w e r or o f a book - b u t ra th e r as an o p e ra tio n o f the encom passing lib ra ry e n tity ; then the b orro w o p e ra tio n w ould be e x te rn a l to book e n titie s . These e x te rn a l o p e ra tio n s characterise the envisaged ‘ usage’ o f an e n tity . We a d o p t th e te rm category

to re fer to such e n tity groupings.

T h e cla ssifica tion o f e n titie s is represented in the database by nam ed co lle ction s o f va.lues representing the e n tity categories. A m e ta le ve l d e s c rip tio n o f a c o lle c tio n is g ive n by a collection scheme (cf. re la tio n scheme) w h ich specifies the nam e and n a tu re o f the c o lle ctio n and the ty p e o f its m em bers. T h e n a tu re o f a c o lle c tio n determ ines w h e th e r a co lle ction may contain more tha n one occurrence o f any value and w hether the m em ber values a.re ordered; it corresponds to a. p a r tic u la r fo rm o f b u lk d a ta type such as a. set. ordered set or bag.

These categories o f e n titie s are not independent since th e )' fo rm p a rt o f a cla ssifica tion s tru c tu re for the a p p lic a tio n dom ain. C la ssifica tio n s tru c tu re s are represented in the conceptual model by the s p ecificatio n o f conce p tu a l dependencies am ong the c o lle ction s. F u rthe r, certain categories o f e n titie s w ill be re la te d th ro u g h the sorts o f re la tio n s h ip s in w hich th e ir m em bers m ay p a rtic ip a te and th is is also represented in the conceptual m odel by conceptual dependencies am ong collections.

A co nce p tu a l m odel m ay therefore be regarded as h a v in g a th re e -le v e l s tru c tu re . A t th e low est level there are the general d escriptio n s o f in d iv id u a l e n titie s o f th e a p p li­ c a tio n d o m a in in term s o f types. These e n titie s are gro up e d to g e th e r in to categories a cco rd in g to th e ir roles in th e a p p lic a tio n d o m a in and th is is m o d e lle d in the concep­ tu a l m o d el by collection schemes w hich describe the co lle ctio n s o f values representing these categories. F in a lly , the categories are re la te d to each o th e r b o th th ro u g h th e ir p a rtic ip a tio n in classification stru c tu re s , and, in te rm s o f the re la tio n s h ip s th a t may e x is t between th e ir members; th is is described in the conce p tu a l m odel as co n stra in ts on co lle ction s. T h is s tru c tu rin g o f the basic, n otions o f co nce p tu a l m odels is illu s tra te d in fig u re 1.1.

T h e a p p lic a tio n re a lity contains person e n titie s and house e n titie s . These e n titie s arc' grouped in to categories o f persons and homes, respectively. T hen we wish to represent the relationships between persons and the homes th e y liv e in.

4

(17)

C H A P T E R 1. I N T R O D U C T I O N 9

A pplication Reality

entities

frr

categories

fit

relationships

modelled by represented by

Conceptual Model

Database

types ^ described by values

person house

person fred, jo h n , m ary house brick, straw

collection schemes collections

Persons : set o f person

Hom es : set o f houses Persons (fre d ,jo h n ,m a ry}

Live s : set o f [person,houses]

instantiated by Homes {b ric k , straw } constraints

Lives <->Persons to Homes

Lives 4 { (fre d ,b rick), (john,straw ), (m ary,straw ) }

(18)

CHAPTER. I. I N T R O D U C T I O N

10

Then the database is an in s ta n tia tio n o f the conceptual m odel in th a t i t represents p a rtic u la r e n titie s o f the a p p lic a tio n re a lity , th e ir roles and th e re la tio n s h ip s between the m . Here it is assumed th a t a person e n tity is represented by a sim p le nam e value e.g. jo h n , and s im ila rly , fo r a house e n tity e.g. brick. T h e n th e c o lle c tio n Persons is the set {fred,jo h n ,mary), and the co lle ction Homes is the set {brick,straw). T he re la tio n s h ip between persons a.nd the homes the y liv e in is represented by co lle ctio n Lives w hich is a m a p p in g fro m Persons to Homes. For exam ple, (fred,brick) is a m em ber o f Lives and th is represents the fa c t th a t fred lives in the b ric k house. For a. given database to be an instance o f a conceptual m o d el, the collections o f the database m ust be as specified in the c o lle c tio n schemes and, m u s t sa tisfy the co n stra in ts of the m odel.

T h e conceptual m odel is co nstru cte d by the database designer and is th e basis fo r the use o f the a p p lic a tio n system . T h e database designer specifies the co nce p tu a l m odel in term s o f a conceptual m o d e llin g language. To assist the database designer in th e ir task and also to ensure th a t the re s u ltin g conceptual m odel is u nd e rstoo d by the users, th is language m ust p ro vide a. num be r o f general co n stru cts th a t arc adequate for the d e s c rip tio n o f da.ta,-intensive a p p lic a tio n system s. T h e key to m a k in g such a language e ffective is th a t it should be sim ple: th is means th a t it should be based on a sm all num be r o f easily understood concepts a.nd should be o rth o g o n a l in the a p p lic a tio n o f these concepts.

Since the conceptual m odel imposes a. s tru c tu re on the values o f the database, the conceptual m o d e llin g language is c o m m o n ly referred to as a d a ta m o d e llin g language - and the co n stru cts on w hich it is based a.s a ‘ d a ta m o d e l’ . T h u s, a data, m o d el could be regarded as a database designer’s to o lk it in th a t it provides the basic com ponents for c o n s tru c tin g a conceptual m odel. T h e te rm ‘ d a ta m o d e l’ is som ew hat confusing since it is not a m odel b u t ra th e r a th e o ry fo r w h ich m odels m ay be co nstru cte d. However, the te rm is in com m on usage and therefore w ill be adopted here. B u t we w ould lik e to emphasise the d is tin c tio n th a t w ill be used between th e te rm s ‘ d a ta m o d e l’ and ‘ conceptual m o d e l’ . Here, we use ‘ d a ta m o d e l’ to m ean the set o f basic constructs and the te rm ‘ conceptual m o d e l’ to mean the m odel o f a p a r tic u la r a p p lic a tio n re a lity th a t is expressed in term s o f these constructs. T h e conceptual m odel o f a database is often referred as the da.taba,se schema.

(19)

C H A P T 1:1 II I. I N T R O D U C T I O N

11

1 .1 .3

E fficient D a t a M a n a g e m e n t

A n a p p lic a tio n system could be designed fro m scratch and th e re b y ta ilo re d fo r the in ten d ed a p p lic a tio n s to o b ta in o p tim a l perform ance. How ever, in general, th is w o u ld be a very expensive s o lu tio n and, in th e long te rm , m ay prove v e ry in e ffic ie n t a,s i t does not cater fo r system e v o lu tio n . A database system w ill evolve over tim e in term s o f its use, its re q u ire m e n ts, its stru c tu re s and its values. T h is means th a t instead o f ‘ h a rd ­ w ir in g ’ a. system to specific database and a p p lic a tio n chara.cteristics, i t is desirable th a t the system is ada p ta b le to change in such a. way th a t good p e rfo rm a n ce is s t ill a tta in e d .

A database may co nta in very large colle ction s o f values and these m ay be p a rt o f a. com plex o verall s tru c tu re . Indeed, the va.lues w hich represent in d iv id u a l e n titie s o f the a p p lic a tio n dom ain m ay them selves be large a n d /o r co m p lex. T h e size and c o m p le x ity o f the database can re su lt in very high costs fo r re trie v a l o perations. To be borne in m in d is the fa c t th a t a. so-called re trie v a l o p e ra tio n m ay n o t be a sim p le lo o k -u p o p e ra tio n - b u t m ay in vo lve co m p lex processing o p e ra tio n s on values in the database. As databases are in c re a s in g ly becom ing an in te g ra l p a rt o f co m p lex a p p lic a tio n system s, such as s c ie n tific processing and design system s, such processing operations can be expensive b o th in space and tim e .

U p d ate o p e ra tio ns m a y also be expensive n o t o n ly in term s o f the tim e to lo ca te va.lues re q u ire d and any processing costs, b u t also, in term s o f e nsu rin g th a t the o verall consistency o f the database is m a in ta in e d . T h u s, an u p d a te o p e ra tio n m a y in c u r high overheads such as c o n s tra in t checking a c tiv itie s .

I f the system takes several m in u tes (or even hours) to p e rfo rm the required o p e ra tio ns, then e ffe c tiv e ly the system m ay be unusable. T h is is p a r tic u la rly tru e in tra n s a c tio n processing system s, such as a irlin e reservation systems, th a t ty p ic a lly in vo lve a. large num ber ol sm all tra nsa ctio ns w ith a. low c ritic a l response tim e .

Different, a p p lic a tio n s m ay have c o n flic tin g requirem ents. T h us the loca.tion a n d /o r representation o f co lle ction s o f values suited to one a p p lic a tio n m ay d iffe r fro m th a t a p p ro p ria te to a n o th e r a p p lic a tio n . I t m ay be possible to m eet the re q u ire m e n ts o f both a p p lic a tio n s th ro u g h re p lic a tio n o f d a ta or m u ltip le re presentations b u t th is then intro du ce s a d d itio n a l overheads in ensuring the consistency o f the database. I t is im p o rta n t to re m e m b er then th a t the o b je c tiv e o f the person responsible fo r the system m ust be the overall e fficien cy o f the system ra th e r than the o p tim a l s o lu tio n w ith respect i.o any one a p p lic a tio n .

(20)

C HA P TEH I. I N T R O D U C T I O N

12

reasonable standards o f efficiency regardless o f specific d a ta and a p p lic a tio n charac­ te ris tic s . T h is can be done th ro u g h the p ro visio n o f s u p p o rt fo r a s m a ll n u m b e r o f general co nstru cts and various im p le m e n ta tio n s o f these co n stru cts to s u it da.ta and a p p lic a tio n ch a ra cte ristics. F u rth e r, o p e ra tio ns on a database should be specified a t a lo g ica l level w h ic h is in d ep e nd e nt o f p hysica l re p re sen tatio n and im p le m e n ta tio n : th e system can the n d e te rm in e th e m ethods o f e valua tio n th ro u g h c o n s id e ra tio n o f th e d a ta ch a ra cte ristics and the c u rre n t form s o f repre sen tatio n . Hence, as th e database system evolves, the u n d e rly in g representations and im p le m e n ta tio n s can be evolved in tu rn w ith o u t recourse to the a p p lic a tio n p ro gram m ers o r end users.

T h e req uirem en ts fo r e fficien t d a ta m anagem ent are n o t o rth o g o n a l to those fo r ex­ pedient data m anagem ent. B oth require the notio n o f a data m odel w h ich pro vide s a sm all n um ber o f constru cts in term s o f w hich one can m odel the a p p lic a tio n dom ain w ith re la tiv e ease. A lso, the y both require the notio n o f a que ry language w hich enables the user to specify o perations on the database at a logical level in te rm s o f the conceptual m odel and indep e nd e nt o f physical representation.

A given co lle ction o f values m ay have a num ber o f possible repre sen tatio n s. T h e m ost a p p ro p ria te representation w ill depend on the size o f the c o lle c tio n in term s o f the num ber o f values it contains, the ch aracte ristics o f the m e m b e r values and the envisaged o pe ra tio n ch a ra cte ristics. I f a co lle ctio n contains o n ly a few m em bers, then a very sim p le representation such as a lin ke d lis t o f values m ay suffice. I f however a c o lle c tio n is large th e n a lin e a r search to re trie v e a p a rtic u la r value w o u ld be unacceptable and therefore some fo rm o f in d ex s tru c tu re w o u ld be m a in ta in e d .

T y p ic a l in d ex s tru c tu re s e m p lo y va ria n ts o f hashing techniques or B -tre e s (o r some c o m b in a tio n o f these). A n in d e x s tru c tu re is b u ilt over some p ro p e rty o r co m b in a ­ tio n o f p ro p e rtie s o f the values o f the c o lle c tio n depending on access p a tte rn s . For exam ple, in a c o lle c tio n o f person o bjects, then i f there are lots o f selection opera­ tio ns based on th e surnam e o f a person then it is reasonable to c o n s tru c t an in d ex o f the co lle ction based on the surnam e values. T h e choice o f in d ex s tru c tu re can la ke in to consideration the s ta b ility o f the co lle ction and also the c h a ra c te ris tic s o f the properties over w hich an index is constru cte d. For exam ple, some fo rm s o f in d ex s tru c tu re are easily expanded as the size o f a co lle c tio n grows, whereas oth e rs invoke sig n ific a n t overheads if the v a ria tio n in the size o f the co lle ctio n is high. F u rth e r, the d is trib u tio n o f the p ro p e rty va.lues is s ig n ific a n t in the selection o f an in d e x s tru c tu re .

It is, o f course, possible to m a in ta in several index s tru c tu re s over a single c o lle ctio n . B u t it is im p o rta n t to rem em ber th a t the in cu rre d overheads o f m a in ta in in g an index s tru c tu re can be high as the the index w ill have to be updated as values are updated. For th is reason, cle a rly it is preferred to m a in ta in index s tru ctu re s over re la tiv e ly stable properties.

(21)

CHAPTER, 1. I N T R O D U C T I O N

13

in d ex s tru c tu re over a. given co lle ctio n m ay be made e ith e r by th e database system a d m in is tr a to r, who has overall re s p o n s ib ility fo r th e system , o r by th e system itse lf. It is possible fo r a system to g ath er s ta tis tic s on access p a tte rn s and, on the basis o f these, decide when to c o n s tru c t a new in d e x s tru c tu re - o r delete an e x is tin g one. A t present, such system s are rare; however, in a n u m b e r o f system s an in d e x m ay be co n stru cte d p u re ly for the e valuation o f a p a rtic u la r query.

T h e m e th o d o f e va lu a tio n o f a p a rtic u la r o p e ra tio n on a c o lle c tio n , o r set o f collec­ tio ns, depends upon the representation o f those collections a.nd also th e ch a ra cte ristics o f those co lle ction s. For exam ple, i f an o pe ra tio n is to select specific o b je c ts fro m a c o lle c tio n based on the values o f some p ro p e rty o f those o b je cts, the n th e m ethod o f eva lu a tio n depends upon w h e th e r or not th a t c o lle c tio n has an in d e x s tru c tu re on th a t p ro p e rty. T h e size o f a co lle ction is also s ig n ific a n t in selecting the im p le m e n ta tio n o f an o p e ra tio n .

A given q u e ry on a database is specified in a query language in te rm s o f the con­ ce p tu a l m odel. T h is query can be tra n s la te d in to an a lgebraic expression in term s o f o p e ra tio n s on collections. Then the selection o f an e va lu a tio n plan fo r the query expression consists o f tw o stages. F irs t, the query expression is tra n s fo rm e d in to an e q u iva le n t expression based on the algebraic p ro p e rtie s o f the o pe ra tio ns. T h is stage is g e n e ra lly know n as logical o p tim is a tio n . T hen an e v a lua tio n plan is constructed for the tra.nsformed query expression a.nd th is plan takes in to account the physical re p re se n ta tio n o f the colle ction s and a.lso th e ir c h a ra cte ristics. T h is stage is know n as physical o p tim is a tio n .

T h e issues o f q ue ry o p tim is a tio n a,re w ell und e rstoo d in th e area o f re la tio n a l database systems. These were the firs t systems to m ake a clear sepa ra tio n between the logical and physical levels o f a database system such th a t th e end-users in te ra c t w ith the da.taba.se o n ly in term s o f its conceptual m odel (o r view s th e re o f) and the system deals w ith th e tra n s la tio n o f operations specified a t the user level in to ope ra tio ns at the physical level.

(22)

CHAPTER. 1. I N T R O D U C T I O N

14

1.2

T h e C o lle c ti o n M o d e l

From the foregoing discussion on database system re q u ire m e n ts, i t should be clear th a t w h ile a general p ersisten t p ro g ra m m in g language su p p o rts e ffe ctua l d a ta m an­ agem ent, it does n ot s u p p o rt a ll o f the features fo r e x p e d ie n t and e ffic ie n t d a ta m anagem ent required by d a ta -in te n s iv e a p p lica tio n s. A p e rsiste n t p ro g ra m m in g sys­ tem provides su p p o rt fo r the persistence o f in d iv id u a l values, b u t it does n o t pro vide e x p lic it s u p p o rt fo r the n o tio n o f a. database as a re p re se n ta tio n o f an a p p lic a tio n dom ain in term s o f in te rre la te d collections o f values. In o th e r words, the persistent system has no n otio n o f a. data, m odel as discussed in th e section on e xpe d ie nt data m anagem ent, and there is no d is tin c tio n between the logical and physical levels o f representation, (d e arly, as they have, no n o tio n o f a databa.se, the y also have no notio n o f a query language th a t expresses ope ra tio ns on a database.

I t is im p o rta n t to emphasise th a t a lth o u g h some p e rsiste n t system s have been ex­ tended to s u p p o rt collections, th is s till falls sh o rt o f o u r database system re quirem ents as there is no e x p lic it s u p p o rt fo r expressing the sorts o f conce p tu a l dependencies am ong these collections re q u ire d to m odel cla ssifica tio n s tru c tu re s and re la tion sh ip s. F u rth e r, it is im p o rta n t to emphasise th a t we are n o t p re sen tin g these as deficiencies o f persistent systems b u t ra th e r are h ig h lig h tin g the d is tin c tio n between database system s and persistent systems. A database system is based on a p e rsiste n t system - b u t it has a d d itio n a l fa c ilitie s a p p ro p ria te fo r th e s u p p o rt o f d a ta -in te n s iv e a p p lica ­ tions. Indeed, a p ersistent p ro g ra m m in g language extended to s u p p o rt th e n otio n s o f a. d a ta m odel and query language is a database p ro g ra m m in g language. For exam ple, the database p ro g ra m m in g language G a lile o [A C 0 8 5 ] is a p e rsiste n t p ro g ra m m in g language w ith sem antic data, m odel features a.nd a b s tra c tio n m echanism s designed to s u p p o rt database a p p lic a tio n p ro g ra m m in g .

In th is thesis, we address the questions o f w h a t these a d d itio n a l fa c ilitie s should be and how they should be provided in the c o n te x t o f o b je c t-o rie n te d systems. T h e p ro ­ posed c o lle ctio n model is intended a.s a general m odel on w h ich to ba.se the pro visio n o f d a ta m anagem ent s u p p o rt in an o b je c t-o rie n te d system . A lth o u g h the m odel was developed in the co n te xt o f a p a rtic u la r o b je c t-o rie n te d p la tfo rm , the general m odel is independent o f the u n d e rly in g p ersisten t o b je c t system . In th is way, th e reported w o rk differs fro m th a t o f database p ro g ra m m in g languages such as G a lilie o in th a t the data m odel is not t ig h tly in te g ra te d w ith a specific p ro g ra m m in g language.

(23)

C U A P T t i l l I. I N T I I O D U C T I O N

15

We separate o u t these tw o parts o f the co lle ctio n m odel to emphasise th a t w h ile the o p e ra tio n a l m odel is dependent on the s tru c tu ra l m o d el, the s tru c tu ra l m o d el can be s u p p o rte d in d e p e n d e n tly o f the o p e ra tio n a l m odel. W h ile we advocate th e extensive use o f h ig h -le v e l queries in a p p lic a tio n p ro g ra m m in g , th e a p p lic a tio n p ro g ra m m e r can choose to a do p t th e structura.I m o d el as a means o f m o d e llin g th e ir a p p lic a tio n d o m a in and the n use basic ite ra to rs and n a v ig a tio n a l techniques to im p le m e n t th e ir a p p lic a tio n s d ire c tly . T h is incurs p enalties in term s o f s u p p o rtin g system e v o lu tio n in th a t the a p p lic a tio n code is then dependent on the p hysical re p re se n ta tio n and m ay n o t be able to take advantage o f new in d e x s tru c tu re s . H ow ever, th is m ay be a p p ro p ria te to c e rta in a p p lic a tio n s and c e rta in ly the use o f the s tru c tu ra l m odel alone is s till beneficial.

'The basic c o n c e p t s s u p p o r t e d in the s tru c tu ra l m odel are e n tity categories, re la tio n ­ ships between e n titie s a.nd rich classification s tru c tu re s both o f e n tity categories and re la tion sh ip s. A n e n tity category is represented by a. co lle ctio n o f a to m ic values. These a to m ic va.lues m ay be any values supported by the u n d e rly in g ty p e system and, in the case o f o bjects, these w ill be o b je c t references. E n tity re la tio n sh ip s fo rm relations between e n tity categories and these are represented by co lle ctio n s o f pairs o f a to m ic values. T h us a re la tio n s h ip between tw o o b je cts w ill be represented by a

pair consisting o f the references o f those objects.

B o th e n tity categories and re la tio n s between categories can be p a rt o f cla ssifica tion stru c tu re s . These c la ssifica tio n s tru c tu re s a llo w e n titie s to be considered as belonging to d iffe re n t roles in the a p p lic a tio n . For exam ple, a person e n tity m ig h t a t one tim e be considered as a s ta ff e n tity , a t a n o th e r tim e as a le c tu re r e n tity , and a t y e t ano the r tim e as a. ten n is player. T h e re are conceptual dependencies between these roles to in d ic a te , fo r e xam ple , th a t le c tu re r is a specialised role o f s ta ff and th e re fo re every le c tu re r e n tity is also a s ta ff e n tity . In a. s im ila r way, re la tio n s can also be specialised. For exam ple, given re la tio n sh ip s between persons and th e ir associated d e p a rtm e n ts, then the re la tio n s h ip s between s ta ff and the d e p a rtm e n ts w h ich e m p lo y them w ould be a specialisation o f the m ore general association.

'The structura.I m odel was given the nam e B R O O M (B in a ry R e la tio n a l O b je c t- O riented M o d e l) to emphasise the im p o rta n c e o f s u p p o rt fo r the d ire c t representa­ tio n o f re la tio n sh ip s between e n titie s . A lth o u g h very sim p le , the e n tity -re la .tio n s h ip model [Che76] has proved very p o p u la r in the m o d e llin g o f the s tru c tu ra l properties o f a p p lic a tio n dom ains. T h e basic concepts in th is m odel are e n tity categories and relations!)! ps.

(24)

C H A P T i m I. I N T R O D U C T I O N

16

m ethods o f these o b je c ts and therefore are e ffe c tiv e ly b u rie d w ith in o b je cts, the overall s tru c tu re o f the a p p lic a tio n d o m a in is n o t re a d ily a pp a re nt. B y decom posing re la tio n sh ip s in th is way. we cannot handle a. re la tio n s h ip as a. single logical u n it. As R um baugh states:

“ ... it is not possible to separate the a b s tra c tio n fro m the im p le m e n ­ ta tio n w ith the same c la rity a.s the re la tio n a l m o d e l.”

F u rth e r, in the design o f large system s, re la tio n s h ip s have been shown to be a useful a b stra ctio n m echanism fo r p a rtitio n in g system s in to subsystem s. R ecently, there ha.ve been a num be r o f proposals for some form o f extension to o b je c t-o rie n te d models to su p p o rt relai ionships as first-class objects.

T h e sem antic d a ta m odels [H K S7], [PMSS] m ig h t be considered as a developm ent o f the e n tity -re la tio n s h ip m odels th a t s u p p o rt c la s s ific a tio n s tru c tu re s based on is a

re la tion sh ip s between e n tity categories. Since one o f th e fu n d a m e n ta l concepts o f o b je c t-o rie n te d d a ta m odels is th a t o f s u b ty p in g and in h e rita n c e , these are often considered to s u p p o rt cla ssifica tio n stru c tu re s . H ow ever, th e y ofte n o m it s u p p o rt fo r the rich conceptual dependencies th a t can arise in cla ssifica tion s tru c tu re s - such as categories p a r titio n in g o th e r categories, the fact th a t c e rta in categories are m u tu a lly exclusive and also the idea o f s u p p o rtin g a lte rn a tiv e c la ssifica tio n views. These have been in co rp o ra te d in to the B R O O M m odel th ro u g h the concept o f c o lle c tio n fam ilies. T he o p e ra tio n a l m odel is based on an algebra o f co lle ctio n s. T h is m irro rs the re­ la tio n a l algebra, w h ich was fu n d a m e n ta l to the success o f th e re la tio n a l m odel. Its success was due to its s im p lic ity , u n ifo r m ity and h ig h -le v e l q ue ry languages s te m m in g fro m the in tro d u c tio n o f the single generic c o lle c tio n ty p e - the re la tio n . T h e basis fo r its hig h-leve l q ue ry languages was an algebra o f o pe ra tio n s on these collections as opposed to the n o tio n o f ope ra tio ns on in d iv id u a l d a ta records th a t had u nder­ pinned the n e tw o rk and h ie ra rc h ic a l d a ta m odels. U n fo rtu n a te ly , the draw back o f the re la tio n a l m odel is th a t it is ju s t too sim p le and lacks se m a ntic s tru c tu re .

A lth o u g h , there have been some proposals fo r an a lgebra w h ic h operates on collec­ tions o f o b je cts, a. n u m b e r o f o b je c t-o rie n te d database system s use o b je c t at a tim e processing and have the re b y lost the advances o f th e re la tio n a l m odel in term s o f its high-level q ue ry processing. A c o lle c tio n algebra, can fo rm the basis o f high-level query languages for o b je c t-o rie n te d system s and, im p o rta n tly , the o p tim is a tio n o f query expressions and query ('va lu a tio n strategies.

In e fle c l. the operationa.I m odel also su pp o rts o p e ra tio n s on a dal abase in th a t an operation in v o lv in g one co lle ctio n can generate o p e ra tio n s on o th e r collections as det erm ined by I he concept ual dependencies am ong co lle ctio n s. For exam ple , deletin g

(25)

C H A P T E R 1. I N T R O D U C T I O N 17

co lle ction s w hich are dependent on th a t co lle c tio n . T h u s, i f we were to delete a p a rtic u la r s tu d e n t o b je c t fro m the c o lle c tio n P e rs o n s th e n we w o u ld also have to delete i t fro m the c o lle c tio n S tu d e n ts i f there is a dependency th a t every m e m b e r o f S tu d e n ts is also a m em b e r o f P e rs o n s .

In s u m m a ry, the co lle ctio n m odel presented in th is thesis in co rpo ra te s m a n y o f the favourable features o f the re la tio n a l, e n tity -re la tio n s h ip and sem antic d a ta m odels. I t has d ire c t s u p p o rt fo r th e re p re sen tatio n o f re la tio n sh ip s; i t su pp o rts ric h cla ssifica tio n s tru c tu re s ; and it ha.s an o pe ra tio n a l m odel based on an algebra o f co lle ction s.

1.3

S t r u c t u r e o f T h e s is

T h is thesis presents a general d a ta m odel w h ich m a y fo rm a fo u n d a tio n fo r the d evelopm ent o f d a ta m anagem ent services in o b je c t-o rie n te d system s. I t assumes as a p la tfo rm any re lia b le , p ersisten t o b je c t store and considers th e p ro v is io n o f e xpe d ie nt d a ta m anagem ent th ro u g h a s tru c tu ra l and an o p e ra tio n a l m odel th a t tog e the r fo rm th e p a rtic u la r d a ta m odel referred to as th e c o lle c tio n m o d el. T h is model can then be used as a basis fo r e fficie n t d a ta m anagem ent by means o f m u ltip le physical re p re s e n ta tio n s tru c tu re s and query o p tim is a tio n techniques.

Hence, the focus o f th is w ork is on the d a ta m o d e llin g aspects o f o b je c t-o rie n te d database system s. In p a rtic u la r, there is an a tte m p t to redress the a p p a re n t im balance in m any proposed o b je c t-o rie n te d database system s where the em phasis has been on effectual and e ffic ie n t d a ta m anagem ent and the issue o f e xpe d ie ncy has been som ew hat neglected. As a re s u lt, m any o f the e x is tin g system s p ro v id e li t t le su p p o rt for the concepts th a t have becom e ce n tra l to the w o rk on d a ta m o d e llin g . F u rth e r, by o m ittin g s u p p o rt fo r the h igher-level da.tabase s tru c tu re s o f data, m odels, the a tte n tio n to e fficien cy addresses a.ccess to in d iv id u a l o b je cts or single co lle ctio n s o f objects. By s u p p o rtin g da.ta.ba.se s tru c tu re s in v o lv in g m u ltip le co lle ctio n s o f o bjects, a.nd o p e ra tio ns on these co lle ction s, o p tim is a tio n s can be m ade at a h igher-level w hich means th a t th e y ten d to be m ore g lobal and less localised. A t th is h ig h e r-le ve l, the o p tim is a tio n techniques are b e tte r able to u tilis e se m a ntic in fo rm a tio n o f the a p p lic a tio n d o m a in .

We begin in C h a p te r 2 w ith a discussion on the fo u n d a tio n s o f d a ta m odels. A d a ta model d ete rm in e s the basic co n stru cts available fo r the c o n s tru c tio n o f conceptual models o f a p p lic a tio n dom ains. We therefore e xam ine the general p h ilo so p h ica l foun­ dations o f conceptual m o d e llin g as a. basis for d e te rm in in g the basic re q uirem en ts o f data models. From these requirem en ts, we present a general fra m e w o rk in which Io consider 1 he m ain c h a ra cte ristics o f the various categories o f data, m odels. T he

(26)

( ' U A P ' l K l l

/.

I N T H O D l i C T I O N

18

C h a p te r 3 deals w ith the s tru c tu ra l aspects o f the c o lle c tio n m odel. T h e specifica­ tio n o f the B R O O M m o d el is presented in fo u r stages. F irs tly , there is an in fo rm a l o ve rvie w w h ic h describes the m a in features o f the m o d el and looks a t some s im ­ ple exam ples. N e x t, th e fu n d a m e n ta l concepts on w h ic h the m o d el is b u ilt nam ely, co lle ctio n s and c o lle c tio n fa m ilie s , are presented in d e ta il. T h is is fo llo w e d by a m e ta ­ c irc u la r d e s c rip tio n o f the B R O O M m odel in w h ich the m odel is described in term s o f its e lf. T h is d e s c rip tio n is used as an in te rm e d ia te stage o f sp e cifica tio n w h ic h is refined in to a fo rm a l s p e cifica tio n in th e language Z [Spi89], [D il9 0 ], [P S T 91]. Such a m e ta -c irc u la r d e s c rip tio n is also useful b o th as a d o c u m e n ta tio n aid fo r the m odel and as a. basis fo r s u p p o rtin g the u n ifo rm tre a tm e n t o f d a ta and m e ta d a ta .

T h e sem antic m o d e llin g c a p a b ilitie s o f the B R O O M m odel are e xam ined in C h a p te r -I. Rust, the s u p p o rt for re I a.) ionships is discussed in d e ta il. T h e n each o f 1 he sem antic data m o d e llin g a bstra ctio ns referred to a.s aggregation, generalisa­ tio n and association is (examined w ith exam ples to d e m o n stra te how these w ould be represented in the B R O O M m odel.

T h e o p e ra tio n a l aspects o f the co lle ctio n m odel are presented in C h a p te r 5. T h re e levels o f o pe ra tio n are possible and the ch a p te r begins w ith an e x a m in a tio n o f these levels. T h e m ain them e o f the ch a p te r is the p resentation o f a c o lle c tio n algebra w h ich deals w ith o p e ra tio n s on collections. T h e p ro pe rtie s o f th e algebra are presented and a discussion o f how the associated algebraic tra n s fo rm a tio n s could be used in query o p tim is a tio n .

A database is n o t a s ta tic e n tity b u t ra th e r is d y n a m ic in th a t i t evolves over tim e . T h e e n titie s represented w ill change and also the form s o f th e ir re presentations m a y change as e n titie s a do p t d iffe re n t roles th ro u g h o u t th e ir life tim e . In a d d itio n , th e s tru c tu re o f the database m ay evolve e ith e r to re fle ct changes in the real w o rld system s th a t they model or because o f changes to the req uirem en ts o f the database system . In C h a p te r 6. we discuss the various form s o f clata.ba.se e v o lu tio n and how these can be su pp o rted . In p a rtic u la r, we propose an extension to the c o lle c tio n m odel to su p p o rt o b je c t e v o lu tio n .

The co lle ctio n m odel was developed w ith in the Com andos p ro je c t [C B H d P 9 3 j. Co- mandos is an E s p rit p ro je c t concerned w ith the c o n s tru c tio n and m anagem ent o f d is trib u te d open system s. In C h a p te r 7, we describe how the c o lle c tio n m odel was realised as part, o f a. Com andos system . T h e c o lle c tio n m o d el was designed as a gen­ eral m odel and is not specific to the Com andos system . To illu s tra te th is p o in t, we also describe a p ro to ty p e o b je c t da.ta m anagem ent system , C O L L E E N , w hich was based on the c o lle c tio n m odel and im p le m e n te d in M a c P ro lo g [LP A 91].

(27)

C h a p ter 2

F o u n d a tio n s o f D a ta M o d e ls

T h e co lle ctio n m odel proposed in th is thesis is a p a rtic u la r d a ta m odel w h ich p rim a rily was designed to s u p p o rt data, m anagem ent in o b je c t-o rie n te d system s. Before going on to present th is m odel, we firs t consider in some d e ta il e x a c tly w h a t a. d a ta m odel is and what, its requirem ents are, b o th in general, and also in the specific co n te x t o f o b je c t-o rie n te d systems.

A d a ta m odel su pp o rts the c o n s tru c tio n o f a m o d el o f a d a ta -in te n s iv e a p p lic a tio n system w ith the in te n tio n o f representing th a t a p p lic a tio n d o m a in by means o f a database system . T h e process o f c o n s tru c tin g an a p p lic a tio n m o d el using a p a rtic u la r d a ta m odel is referred to as d a ta m o d e llin g . W e te rm the c o n s tru c te d m odel a conceptual m o d el o f the a p p lic a tio n d om ain . Such a co nce p tu a l m o d el should be

adequate in th a t i t should ca p tu re th e relevant features o f the a p p lic a tio n d om ain ,

and, fu rth e rm o re , i t should be n a tu ra l in th a t i t should correspond to the sorts o f m e n ta l m odels th a t users c o n s tru c t fo r m e n ta l processing.

T h e general area, o f s tu d y concerned w ith the c o n s tru c tio n o f m odels w h ich correspond d ire c tly and n a tu ra lly to o u r own co n ce p tu a lisa tio ns o f re a lity is know n as e ith e r conceptual m o d e llin g or c o g n itiv e m o d e llin g . T h e process o f d a ta m o d e llin g is a special case o f conceptual m o d e llin g and it follow s th a t th e fo u n d a tio n s o f conceptual m o d e llin g are an im p o rta n t s ta rtin g p o in t in an a tte m p t to d e te rm in e the general req uirem en ts o f da.ta. models. T h erefo re , th is c h a p te r begins w ith an e x a m in a tio n o f some o f the philosophical fo u n da tio ns o f conceptual m o d e llin g th a t are p a rtic u la rly p e rtine n t to databa.se system s and give some in s ig h t in to th e u n d e rly in g basis fo r the proposed co lle ctio n model.

A ris in g from the.-.o philosophical considerations, we a rriv e at some requirem ents for data models th a t in tu rn form the basis of a general fra m e w o rk for da.ta, models. We present this fram ew ork in section 2.2 and then go on to consider the various categories of e x is tin g data models in term s of th is fra m e w o rk in section 2.3.

Figure

Figure 1.1: Conceptual Models and Databases

Figure 1.1:

Conceptual Models and Databases p.17
Figure 2.1: Representation of Concepts

Figure 2.1:

Representation of Concepts p.34
Figure 2.2: Three Levels of Structure and Operation

Figure 2.2:

Three Levels of Structure and Operation p.36
Figure 2.3: The Collection M odel

Figure 2.3:

The Collection M odel p.37
Figure 3.1: The Three Levels of Database S tructure

Figure 3.1:

The Three Levels of Database S tructure p.50
Figure 3.2: A Relation between Collections

Figure 3.2:

A Relation between Collections p.52
Figure 3.3: Graphical N otation for Collections

Figure 3.3:

Graphical N otation for Collections p.53
Figure 3.4 gives a. simple example schema for the university database expressed in the

Figure 3.4

gives a. simple example schema for the university database expressed in the p.54
Figure 3~r. Fxam ple of Data. M odelling Language

Figure 3~r.

Fxam ple of Data. M odelling Language p.56
fig u re  3.7.

fig u

re 3.7. p.64
fig u re  3.1 I. we show  o n ly  p a r t ol th e  re s u ltin g  si r u n  m e .

fig u

re 3.1 I. we show o n ly p a r t ol th e re s u ltin g si r u n m e . p.68
Figure 3.13: Relations

Figure 3.13:

Relations p.71
Figure To: An O bject Aggregation

Figure To:

An O bject Aggregation p.97
Figure 4.8: Classes a.s E ntities

Figure 4.8:

Classes a.s E ntities p.99
Figure 1.10: M u ltip le  Classification Views

Figure 1.10:

M u ltip le Classification Views p.102
Figure 5.1: bevels of Operation

Figure 5.1:

bevels of Operation p.106
Figure 5.2: Using an Iterato r to Scan a C ollection

Figure 5.2:

Using an Iterato r to Scan a C ollection p.107
Figure 5.3: Parts and Suppliers Exam ple

Figure 5.3:

Parts and Suppliers Exam ple p.124
Figure 5.4: Type Declarations

Figure 5.4:

Type Declarations p.125
Figure 5.5: A ssociativity and C o m m utativity of B inary Operations

Figure 5.5:

A ssociativity and C o m m utativity of B inary Operations p.127
Figure 5.6: Basic Collection Operations

Figure 5.6:

Basic Collection Operations p.129
Figure 6 . 1 : Texts as Part of Modules

Figure 6 .

1 : Texts as Part of Modules p.136
Figure 6.3: Texts as a Relation

Figure 6.3:

Texts as a Relation p.138
fig u re  6.5: Classification S tructure for U niversity Persons

fig u

re 6.5: Classification S tructure for U niversity Persons p.146

References