A C o lle c tio n M o d e l fo r
D a t a M a n a g e m e n t in
O b j e c t - O r ie n t e d S y s te m s
M o i r a C. N o r r i e
A tli('sis s u b m itte d for I he degree o( D o c to r o f P hilosophy,
to the
F aculty of Science, U n iv e rs ity o f Glasgow
Decem ber 1992.
ProQuest Number: 13818579
All rights reserved
INFORMATION TO ALL USERS
The qu ality of this repro d u ctio n is d e p e n d e n t upon the q u ality of the copy subm itted.
In the unlikely e v e n t that the a u th o r did not send a c o m p le te m anuscript and there are missing pages, these will be note d . Also, if m aterial had to be rem oved,
a n o te will in d ica te the deletion.
uest
ProQuest 13818579
Published by ProQuest LLC(2018). C op yrig ht of the Dissertation is held by the Author.
All rights reserved.
This work is protected against unauthorized copying under Title 17, United States C o d e M icroform Edition © ProQuest LLC.
ProQuest LLC.
789 East Eisenhower Parkway P.O. Box 1346
i
A b s t r a c t
T h is thesis addresses the question o f how to p ro v id e d a ta m anagem ent services in o b je c t-o rie n te d system s w ith re lia b le p ersistent o b je c t stores. I t proposes a.1 1 o b je c t
d a ta m odel, called the c o lle ctio n m odel, w h ich serves as a fo u n d a tio n fo r the con s tru c tio n o f such services. T h e c o lle c tio n m odel is general in th a t it is in dependent of any p a rtic u la r im p le m e n ta tio n p la tfo rm . In p a r t, th is independence is achieved th ro u g h the separation o f the d a ta m odel fro m the u n d e rly in g ty p e m odel.
T h e re are tw o com ponents o f the c o lle c tio n m odel - a s tru c tu ra l m o d el, B R O O M , and an o p e ra tio n a l m odel based on an algebra o f co lle ction s. T h e s tru c tu ra l m odel is s e m a n tic a lly ric h and e x h ib its p ro pe rtie s o f b o th th e e n tity -re la tio n s h ip and sem antic d a ta models. U nary collections are used to represent e n tity categories and b in a ry collections to represent re la tion sh ip s between e n titie s . C la ssifica tio n s tru c tu re s are based on the notion o f a. c o lle c tio n fa m ily w h ich represents various form s o f conceptual dependencies am ong the colle ction s o f a fa m ily .
T h e requirem ents for s u p p o rtin g the various form s o f e v o lu tio n in o b je c t-o rie n te d database system s are presented. A n extension to the c o lle c tio n m odel is proposed to s u p p o rt o b je c t e v o lu tio n w hereby o b je cts can m ig ra te w ith in cla ssifica tion stru cture s. T w o e x is tin g realisations o f the co lle ction m odel are described. One is a p ro to ty p e , single-user system im p le m e n te d in P rolog. T h e o th e r form s the basis o f the O b je c t D a ta M anagem ent Services o f the Coma.ndos p la tfo rm for d is trib u te d , o b je c t-o rie n te d a pp lica tio ns.
A c k n o w l e d g e m e n t s
M uch o f the w ork re p orte d in th is thesis form s p a rt o f the E s p rit p ro je c t 2071, Com andos. I acknow ledge the c o n trib u tio n s o f the rest o f the C om andos p ro je c t g ro up a t G lasgow, p a rtic u la rly those o f the O D M S group: the y are Steve B lo tt, C o lin D u n lo p , D avid H a rp er, A rk a d i k o s m y n in and A n d re w W alker. Special tha n ks are due to D avid H a rp e r for ta k in g on the re s p o n s ib ility for the O D M S g ro u p and the supervision o f th is thesis. His c o n trib u tio n s to the' design o f the c o lle c tio n m odel were invaluable. The im p le m e n ta tio n of the O D M S was und e rtake n by Steve B lo tt and A n d re w W a lke r and they, along w ith D avid, should take the c re d it fo r tu r n in g a dream in to re a lity. C olin D unlop and A rk a d i K osm yn in were responsible fo r the developm ent ofdataba.se tools for use w ith t he collect ion m odel. F u rth e r th a n k s are due to St eve lor his com men Is on d ra il s of this 11 insis and to C olin for his considerable help wit li I lie diagram s.
C o n te n ts
11 it ro d n e t io n
I . l 1 )atabase Svst em Requirem ent s ...
I . I . I E ffectual Data M a n a g e m e n t... 1.1.2 Expedient Data M a n a g e m e n t ... 1.1.3 E ffic ie n t D a ta M a n a g e m e n t ... 1.2 T he C o lle c tio n M o d e l...
1.3 S tru c tu re o f T h e s i s ...
F o u n d a tio n s o f D a ta M o d e ls
2.1 P hilosophical F o u n d a t io n s ...
2.2 A (le n e ra l Fram ew ork ...
2.3 1 )al a M o d e ls ...
2.4 O b je c t-O rie n te d Database M anagem ent System s .
T h e S t r u c t u r a l M o d e l: B R O O M *
3.1 A n O ve rvie w o f B R O O M ... 3.2 ( 'o ile d ions and C o lle c tio n F a m i l i e s ...
3. 2. I ( ' o l l e r l l ol l t i l II I ll I CS . . . .
ii i
1
1
3 6
11
14
17
19
20
25 29
35
40
42
49
C O N T I M T S
iv
3.3 A M e ta c irc u la r D e scrip tio n o f B R O O M ... 55
3.4 A Z S pecification o f B R O O M ... 64
1 S e m a n tic M o d e ll in g in B R O O M 80 4.1 R e la t io n s h ip s ... 82
4.2 A gg re g atio n , A ssociation and G e n e r a lis a tio n ... 87
5 T h e O p e r a t io n a l M o d e l 97 5.1 O p e ra tio n a l L e v e l s ... 98
5.2 A C o lle ctio n A l g e b r a ... 102
5.2.1 O pera tio ns on C o lle c t io n s ... 104
5.2.2 O pe ra tio ns on B in a ry C o lle c t io n s ... 107
5.2.3 O pe ra tio ns on O rdered C o l l e c t i o n s ... 109
5.2.4 O pera tio ns on S e t s ... 110
5.2.5 O pe ra tio ns on B a g s ... I l l 5.2.6 S pe cifyin g C o lle c tio n O p e ra tio ns in Z ... 113
5.3 E xam ple Q u e r i e s ... 115
5.4 P roperties of the C o lle c tio n A lg e b r a ... 119
6 E v o lu t io n 123 6.1 O bject ( ’real ion and D e l e t i o n ... 124
6.2 O L )eel E v o l1 11 u n i ... 133
C O N T E N T S v
7 R e a lis in g th e C o lle c t io n M o d e l 144
7.1 C O L L E E N ... 145 7.2 T h e Com andos O D M S ... 149
8 C o n c lu s io n s 155
8.1 C o n t r ib u t io n s ... 155
C h ap ter 1
In tro d u ctio n
W ith the advent o f persistent p ro g ra m m in g system s and p e rsiste n t o b je c t stores, some o f the fu n ctio n a .iity o f a database m anagem ent system (D B M S ) is p ro vide d by a general persisten t system . We are forced to ask ourselves questions such as: “ W h a t fu n c tio n a lity o f a D B M S is n o t su p p o rte d by a p e rsiste n t store?’’', “ W h a t distinguishes a. database system fro m a p e rsiste n t system ?” .
To answer these questions, we m u st firs t e xam ine th e basic re q u ire m e n ts o f a database system . T h is enables us to e stablish a clear c h a ra c te ris a tio n o f b o th database system s and p ersistent system s and the re b y assert th e a d d itio n a l fu n c tio n a lity re q u ire d o f a database m anagem ent system . We ca.n the n address th e issue o f how to p ro v id e th is fu n c tio n a lity o f a database m anagem ent system in a.n o b je c t-o rie n te d system w itlb a
persistent o b je c t store.
1.1
D a t a b a s e S y s t e m R e q u i r e m e n t s
A database system is a. softw are system w hich su pp o rts a d a ta -in te n s iv e a p p lic a tio n . In the 60 \s and 70’s, such a p p lic a tio n s tended to be re la tiv e ly sim p le and s tra ig h tfo r ward in th a t the s tru c tu re and use o f d a ta was re g ula r and w e ll-k n o w n . For exam ple, database system s were developed for stock c o n tro l, p a y ro ll and p a tie n t record a p p li cations.
C l IA P T E l i I. I N T R O D U C T I O N 2
A id e d Design and C o m p u te r Assisted S oftw are E ngineering. These a p p lic a tio n sys tem s m odel co m p lex hum an processing system s and s u p p o rt a w ide range o f a c tiv itie s . For exam ple , an O ffice In fo rm a tio n S ystem encompasses s u p p o rt for a ll office a c tiv itie s ra th e r th a n s u p p o rtin g a single a c tiv ity such as the m ain ten a nce o f personnel records.
T o p ro v id e c o m p u te r s u p p o rt fo r an a c tiv ity , we m u st firs t be able to describe th a t a c tiv it y and, in th e case o f co m p lex h um an a c tiv itie s such as those o f designing a sh ip or p ro d u c in g a. large softw are system , it is an in tric a te task to p roduce such a d e s c rip tio n . O ne o f the aim s o f the designers o f a database m anagem ent system is to p ro v id e the database a p p lic a tio n system designer w ith a set o f concepts th a t assist the construct ion o f such an a p p lic a tio n m odel.
T h e m ain c h a ra c te ris tic o f all d a ta -in te n s iv e a p p lic a tio n s is the re p re se n ta tio n o f a large n u m b e r o f in te rre la te d e n titie s o f the a p p lic a tio n d om ain . Each e n tity w ill be represented in th e database system by a. value o r an association o f values. Here, the. te rm ‘ va lu e ’ is used in its m ost general sense to mean any v a lid data, ite m o f a. p a rtic u la r system . A value m ay be a sim p le value, such as the integer 3 o r the s trin g jo h n . or a co m p lex record or o b je c t value.
Instead o f an e n tity being represented d ire c tly by a single value, it m ay be the case th a t il is represented by an association o f values. For exam ple, a. p a rtic u la r person m ig h t be represented by the association o f the s trin g value jo h n and the s trin g value 24 H ig h S t r e e t . A ssociations o f values are also used to represent re la tio n s h ip s be tween e n titie s . I f a p a rtic u la r person e n tity is represented by th e value p, and a p a r tic u la r d e p a rtm e n t e n tity is represented by th e value d, the n th e fa c t o f th a t person being e m p loye d by th a t d e p a rtm e n t m ay be represented by the association
( p , d ) .
T h e fo rm o f re p re se n ta tio n o f an e n tity , or a re la tio n s h ip between e n titie s , w ill depend on the form s o f values and associations o f values su pp o rted in a. p a rtic u la r system . A n o b je c t-o rie n te d system can represent e n titie s d ire c tly by means o f o b je c t values whereas a re la tio n a l data.ba.se system o n ly su pp o rts sim p le values and g e n e ra lly e n ti ties o f the a p p lic a tio n dom ain have to be represented th ro u g h associations o f values held a.s tup le s o f rela tion s. F u rth e r, fo r a given system , the database designer m ay have a choice' as to the' form o f representation o f p a rtic u la r e n titie s and re la tion sh ip s.
It follow s th a t a database system is concerned w ith the m anagem ent o f a. large n u m b e r o f value's and associations between values. 'The general re q uirem en ts o f such a. system can be expressed in term s of ’ I he lo u r e V . II a. database system is l.o be ejjeclive, the m anagem ent o f values must be ejJectiuiL expedient and efjicieiit.
C H A P T E R
/.
I N T R O D U C T I O N3
system w h e th e r th a t user be a. database a d m in is tra to r, a p p lic a tio n p ro g ra m m e r or end-user. For a system to be effective, it m u s t also be e ffic ie n t in th a t o perations m u st be perform ed w ith in an acceptable tim e p e rio d .
N ote th a t these three requirem ents o f database system s are n o t in d e p e n d e n t. F u rth e r, some features m ay c o n trib u te to m ore than one re q u ire m e n t. For e xam ple , s u p p o rt for d a ta independence w hich divorces the lo g ical and p hysical d e scrip tio n s o f d a ta ha.s benefits in term s o f both user convenience and efficiency o f b o th system o perations and p ro g ra m m e r p ro d u c tiv ity .
Fa.ch o f these three requirem ents is now exam ined in some d e ta il.
1.1.1
E ffectu a l D a t a M a n a g e m e n t
I f a. da.ta.base system is to sa tisfy its in ten d ed purpose, i t is fu n d a m e n ta l th a t it ensures the v a lid ity o f the d a ta values th a t i t manages. T h is means th a t the system m ust p ro te ct the data against loss or c o rru p tio n due e ith e r to fa ilu re o r user abuse. F u rth e rm o re , the data, values m ust be c u rre n t in th a t the database m u st reflect the effects o f successfully com pleted operations on the da.ta.ba.se. T h u s, there m ust be some form o f lo n g -te rm , re lia b le storage o f d a ta values in o rd e r th a t the database w ill persist between a p p lic a tio n p ro gram executions and beyond m a ch in e shutdow ns or failures.
For values to persist beyond p ro gram e xecu tio n, the re m u st be some fo rm o f p ersistent address space. In conventional system s, the lo g ical u n it o f persistence supported by the o p e ra tin g system is the file a.nd the persistent address space is c o n tro lle d by the file manager.
Most, general purpose p ro g ra m m in g language system s adopted th is basic m odel o f persistence by h a vin g a. single fo rm o f p ersisten t value - the file . U su a lly, a hie corresponds to a. sequence o f u n ifo rm ly fo rm a tte d records. I t was recognised th a t there are tw o m a jo r problem s w ith such a system . F ir s tly , a ll d a ta values w h ich the p ro g ra m m e r wishes to persist m ust be converted to an a p p ro p ria te file fo rm a t. Secondly, since th e persisten t d a ta is e ffe c tiv e ly stored in raw fo rm a t, there are no m echanism s to ensure th a t the d a ta is used ‘ sa fe ly’ .
To overcom e these problem s, there arose the notion o f a p ersisten t p ro g ra m m in g language in w hich any form ol data value could persist. T w o o f the firs t persis te n t p ro g ra m m in g languages were developed in d e p e n d e n tly in the F ast and West by Z a m u lin [Z a m 7 T Zam 78, Zamdd] and A tk in s o n [A G C S l, A B G + 83]. respectively.
C H A I ’ TVH I. I N T I I O D U C T I O N 4
file s tru c tu re s and vice versa. Hence, the use o f p ersistent p ro g ra m m in g languages fo r d a ta -in te n s iv e a p p lic a tio n s reduced b o th p ro g ra m m in g e ffo rt and p ro g ra m code. F u rth e r, since these system s store ty p e in fo rm a tio n along w ith the d a ta values in the p e rs is te n t store, th e y p ro v id e s u p p o rt fo r e nsu rin g the safety o f d a ta use, i.e. th a t i t is used in a c o rre c t and m e a n in g fu l way.
To ensure the resilience o f the system to m a ch ine failu res, the p ersisten t values m ust be held on n o n -v o la tile storage. M a g n e tic disks are the m ost w id e ly used n o n -v o la tile m e d iu m for the storage o f large q u a n titie s o f c u rre n t, u p d a ta b le d a ta values. We d is tin g u is h c u rre n t data, values from a rchive o r back-up data, w hich m ay be held on m a g n e tic lape. A lso, wo d is tin g u is h between databases w hich may be updated fro m those t ha 1 are a read-only in fo rm a tio n store and may be held on a read-only m edium such as C l) R O M . W'c note I hat wit h the deve lo p m en t o l system s w ith large non v o la tile H A M . there arc a n um be r o f research pro jects in v e s tig a tin g m a in m em ory database system s in w hich all persistent va.lues are held in R A M .
'Therefore, ty p ic a lly the p ersisten t values o f a. data.-intensive a p p lic a tio n system w ill be held on m a g n e tic disk. T h e persistent va.lues associated w ith an a p p lic a tio n system are te rm e d , c o lle c tiv e ly , the database o f the system . In some p ersisten t p ro g ra m m in g languages, i t is possible to access m ore th a n one database in a p ro g ra m and therefore the p ersisten t va.lues o f an a p p lic a tio n system m ay be stored as one or m ore databases. Those p ersisten t values accessed by an a p p lic a tio n w ill be m apped in to the p ro g ra m ’s v ir tu a l address space as re q uired .
As stated p re vio u sly, the system m ust ensure th a t the effects o f successfully com ple te d o p e ra tio n s are reflected in the database. O n the o th e r hand, the system m ust p re v e n t o p e ra tio n s w h ich fa il to co m p lete successfully fro m leaving th e database in an in c o n s is te n t state. I t is therefore necessary to in tro d u c e the n o tio n o f a logical u n it o f processing - the tra n s a c tio n . A tra n sa ctio n is an a to m ic o p e ra tio n on the database in th a t e ith e r it com pletes successfully a.nd a ll o f its effects a.re reflected in the databa.se or it (ails to co m p le te a.nd none o( its effect,s a.re, reflected in the database.
A successful tra n s a c tio n com pletes w ith a. co m m it o pe ra tio n . A ll persistent values w hich have been created or u pdated in the tra n s a c tio n ’s v irtu a l address space w ill be m apped in to the p ersistent address space. I f the m achine were to fa il w h ile the persistent, va.lues are being w ritte n to the n o n -v o la tile storage, some o f these updates could be lost and th e database le ft in an in c o n s is te n t state. T o pre ven t th is , a log o f th e tra n s a c tio n ’s processing and outcom e is m a in ta in e d and, at the p o in t o f c o m m it, th is log is w ritte n onto stable storage. In the event o f m achine fa ilu re before the updates are w ritte n to the database, e x a m in a tio n o f the log record d u rin g recovery shows th a t 1 he decision was taken to c o m m it a.nd the new database state can be const ructed from t he in fo rm a t ion in the log.
C H A P T E R . I. I N T R O D U C T I O N
the tra n s a c tio n ’s v irtu a l address space w ill n ot be m apped in to th e p e rsiste n t address space. In th is way, the database is unaffected by any processing p e rfo rm e d by the tra n s a c tio n before a b o rtio n .
One o f the e a rly m o tiv a tio n s for database system s was the se pa ra tio n o f d a ta fro m program s, the re b y a llo w in g a ce n tra l d a ta re p o s ito ry w h ic h could be shared by a n u m b e r o f a p p lic a tio n program s. T h is n o tio n can be generalised to th e sepa ra tio n o f know ledge fro m th e a p p lic a tio n o f know ledge. T h u s, the re can be a shared re p o s ito ry o f know ledge a bo u t some a p p lic a tio n d o m a in and a p a rtic u la r a p p lic a tio n p ro g ra m can u tilis e w h ichever parts o f th is know ledge re p o s ito ry i t requires and a p p ly th is know ledge in w ha te ve r w ay it chooses.
T h is generalisation is in tro d u ce d here to emphasise th a t the same p rin cip le s o f per sistence a p p ly w h atever form s o f values m ay persist. T h us, some p e rsiste n t p ro g ra m m in g languages, such a,s PS-algol [A C C S l], [A B C + S3] and N a p ie r [M B C D 8 9 ], have in tro d u c e d procedures as first-class o b je cts and one fo rm o f p ers is te n t value is the procedure. S im ila rly , a d e d u ctive database system is based on the logic p ro g ra m m in g p aradigm in which know ledge a bo u t the a p p lic a tio n dom ain is recorded as a set o f fa d s and a set o f general rules. T h erefo re , the p e rsiste n t store o f a d ed u ctive database system m ust store both values w hich are facts and values w h ich are rules. In o b je c t-o rie n te d systems, an o b je c t has associated m ethods and the im p le m e n ta tio n s o f these m ethods must also persist.
W h a te v e r the form s o f p ersisten t values su p p o rte d by a system , it m ay be a re q u ire m e n t o f th e system th a t these values can be shared by a n u m b e r o f a p p lic a tio n program s. T h e re m u st be some m echanism to ensure th a t tw o o r m ore a p p lic a tio n program s accessing the same p ersisten t store do n o t in te rfe re w ith each o th e r’s op e ra tio n . A n a p p lic a tio n p ro g ra m m ay in c o rp o ra te one o r m ore tra n s a c tio n s where a tra n s a c tio n is an a to m ic o p e ra tio n on th e database. Hence, w h a t is re a lly re q uired is some fo rm o f co n tro l to p re ven t c o n flic t am ong co n c u rre n t tra n s a c tio n s w h ich op erate on the same persistent store. T h is mea.ns th a t the effects on the database a.nd the o u tp u t o f the tra nsa ctio ns m ust be the same as i f these tra n s a c tio n s had ru n in iso la tion .
T h e system m ust su p p o rt some form o f co ncu rren cy c o n tro l m echanism w h ich e ith e r re stricts access to the persistent va.lues o r re s tric ts the c o m m it o p e ra tio n o f transac tions in such a way th a t the effect o f processing a set o f tra n sa ctio n s c o n c u rre n tly is e q u iva le nt to (he effect o f h aving processed these tra n sa ctio n s one a fte r the o th e r in some specified order, i.e. the tra nsa ctio ns are serializable.
CHAPTER. 1. I N T R O D U C T I O N
6
when it arises and recover fro m it by the ra th e r co stly process o f a b o rtin g tra nsa ctio ns. T h is can be achieved by h aving tra n sa ctio n s operate on th e ir own local copies o f persistent values and then checking before the c o m m it o p e ra tio n i f th e ir o p e ra tio n is in c o n flic t w ith th a t o f any tra n sa ctio n s th a t have c o m m itte d d u rin g th a t tra n s a c tio n ’s processing. In the case o f c o n flic t, the c o m m it o p e ra tio n is n ot allow ed to proceed. In general, the p essim istic approach is preferred where the lik e lih o o d o f c o n flic t is high and th e o p tim is tic approach is preferred where the lik e lih o o d o f c o n flic t is low. In su m m a ry, effectual d a ta m anagem ent requires some fo rm o f p e rsiste n t store and tra nsa ctio n m anagem ent. T h e tra n s a c tio n m anagem ent should at least p ro v id e some form o f recovery mechanism and, if the p ersistent store is to be sharable, then it m ust also s u p p o rt some form o f co ncurrency c o n tro l m echanism . M a n y va ria nts o f m echanism s for recovery and co ncurrency c o n tro l have been proposed and here o n ly a very b rie f d e scrip tio n o f th e ir re q uirem en ts ha.s been p ro vid e d . A good com prehensive d e scrip tio n o f the various issues and proposed m echanism s is g iven in the book by B e rnste in , hladzilacos and G oodm an [BHG S7].
In a d d itio n to the above, there should also be se c u rity m echanism s to pre ven t unau tho rised access to data. T h e tw o basic form s o f access c o n tro l m echanism s are lis t- based or token-based. List-based m echanism s associate w ith a p e rsiste n t value (or group o f values), a. lis t o f authorised users and th e ir access privileges. A token-based system p e rm its access to a p ersistent value (or group o f values) by users w ith the a p p ro p ria te token. T h e second kin d o f access c o n tro l is the basis o f th e c a p a b ility systems [LS7G].
D is trib u te d system s require fu rth e r extensions to the p e rsiste n t store and tra n s a c tio n m echanism s. T h e persisten t address space spans several nodes and a tra n s a c tio n m ig h t access persistent values across the nodes o f th is address space. P ersistent va.lues m ig h t be replica,ted at d iffe re n t nodes to increase the levels o f lo c a lity o f access and a v a ila b ility o f data. C onsequently, the recovery, co n cu rre n cy c o n tro l and se cu rity m echanism s have to be extended to take th is in to account. For e xa m p le , some va ria n t o f the two-pha.se c o m m it p ro to c o l m ig h t be used to ensure th a t the effects o f a, tra n sa ctio n are c o m m itte d a.t a ll p a rtic ip a tin g nodes o r none o f th e p a rtic ip a tin g nodes.
1.1.2
E x p e d ie n t D a t a M a n a g e m e n t
C H A P T E R I. I N T R O D U C T I O N 7
To assist the users in th e ir m aintenance and re trie v a l a c tiv itie s , i t is v it a l t h a t th e y be presented w ith a conceptual m o d el o f the a p p lic a tio n re a lity . T h is c o n c e p tu a l m odel is a m etalevel d e scrip tio n in term s o f the general concepts o f in te re s t in th e a p p lic a tio n d om ain . I t is the task o f the da.ta.ba.se designer to c o n s tru c t the co nce p tu a l m odel.
T h e set o f general concepts w ill be based upon the re c o g n itio n o f the e xistence o f s im ila ritie s am ong e n titie s and th e ir associations. A n e n tity lias a set o f pro pe rtie s each o f which m ay be e ith e r structura.I or o p e ra tio n a l. From now on, w hen we refer to the p ro pe rtie s o f an e n tity , it is assumed th a t we refer o n ly to those p ro p e rtie s o f in te re s t in a p a rtic u la r a p p lic a tio n system ra th e r th a n a ll p ro p e rtie s e x h ib ite d by the e n tity . A s tru c tu ra l p ro p e rty is referred to as an a ttr ib u te o f the e n tity a.nd it has an associated value. For exam ple, an e n tity m ig h t have a. name a ttr ib u te w ith the value jo h n . H ie set o f s tru c tu ra l properties, or a ttrib u te s , o f an e n tity d e te rm in e the form o f the e n tity . An operationa.I p ro p e rty specifies an o p e ra tio n th a t th e e n tity can perfo rm . The' set o f o pe ra tio na l properties o f an e n tity characterises its b e h a vio ur.
I f a. set of e n titie s have s im ila r fo rm and b ehaviour, the n a general d e s c rip tio n o f the p ro pe rtie s o f these e n titie s m ay be in tro d u c e d and these e n titie s w ill have a com m on repre sen tatio n in the database. In th is way, th e re is a m ove fro m th e p a rtic u la r to the general and the in tro d u c tio n o f a m e taleve l d e s c rip tio n o f the re p re se n ta tio n o f these e n titie s referred to as a type. A ty p e gives a general d e s c rip tio n o f a, value in term s o f the prope rtie s th a t value m u st hold and we say th a t a value w ith those p ro p e rtie s is an instance o f th a t type.
A system m ay p ro vide up to fo u r basic kin d s o f types. F irs tly , the system m a y sup p o rt a num be r o f p rim itiv e types fo r w hich the s tru c tu ra l and o p e ra tio n a l p ro p e rtie s are predefined. For exam ple, c o m m o n ly the p r im itiv e types in t e g e r , and b o o le a n are available. Secondly, the system m ay s u p p o rt a n u m b e r o f s tru c tu re d types fo r w hich the o pe ra tio na l prope rtie s are predefined b u t the structura.I p ro p e rtie s are not. For exam ple, record and enum erated types a.re s tru c tu re d types; the y have fixed op erations such a.s selectors on records - b u t the s tru c tu re o f the records is specified by the user. T h ird ly , the system m ay s u p p o rt operationa.I types in w hich the s tru c tu re is fixed and o pe ra tio na l p roperties a.re specified by the user. E xam ples o f these a.re procedure or fu n ctio n types. F o u rth ly , the system m ay s u p p o rt a b stra ct types for w hich both s tru c tu ra l and o p e ra tio n a l p ro p e rtie s have to be specified by th e user. These are a b stra ct data, types o r o b je c t types.
N ote th a t just, as an e n tity m ay be a p a rtic u la r o f several general concepts, its rep resentation value m ay have pro pe rtie s a d d itio n a l to those o f a given typ e and it is therefore possible for a value to be considered a,s an instance o f m any types - p ro vide d th a t it has the p roperties o f each o f those types.
CHAR TU R I. I N T R O D U C T I O N
8
o p e ra tio n s on e n titie s . A n o p e ra tio n is e x te rn a l to an e n tity i f i t is n o t a p ro p e rty o f th a t e n tity b u t ra th e r a p ro p e rty o f an encom passing e n tity . For e xam ple , in a lib ra ry system the o p e ra tio n o f b o rro w in g a book m ay be view ed - n o t as an o p e ra tio n o f a. b o rro w e r or o f a book - b u t ra th e r as an o p e ra tio n o f the encom passing lib ra ry e n tity ; then the b orro w o p e ra tio n w ould be e x te rn a l to book e n titie s . These e x te rn a l o p e ra tio n s characterise the envisaged ‘ usage’ o f an e n tity . We a d o p t th e te rm category
to re fer to such e n tity groupings.
T h e cla ssifica tion o f e n titie s is represented in the database by nam ed co lle ction s o f va.lues representing the e n tity categories. A m e ta le ve l d e s c rip tio n o f a c o lle c tio n is g ive n by a collection scheme (cf. re la tio n scheme) w h ich specifies the nam e and n a tu re o f the c o lle ctio n and the ty p e o f its m em bers. T h e n a tu re o f a c o lle c tio n determ ines w h e th e r a co lle ction may contain more tha n one occurrence o f any value and w hether the m em ber values a.re ordered; it corresponds to a. p a r tic u la r fo rm o f b u lk d a ta type such as a. set. ordered set or bag.
These categories o f e n titie s are not independent since th e )' fo rm p a rt o f a cla ssifica tion s tru c tu re for the a p p lic a tio n dom ain. C la ssifica tio n s tru c tu re s are represented in the conceptual model by the s p ecificatio n o f conce p tu a l dependencies am ong the c o lle ction s. F u rthe r, certain categories o f e n titie s w ill be re la te d th ro u g h the sorts o f re la tio n s h ip s in w hich th e ir m em bers m ay p a rtic ip a te and th is is also represented in the conceptual m odel by conceptual dependencies am ong collections.
A co nce p tu a l m odel m ay therefore be regarded as h a v in g a th re e -le v e l s tru c tu re . A t th e low est level there are the general d escriptio n s o f in d iv id u a l e n titie s o f th e a p p li c a tio n d o m a in in term s o f types. These e n titie s are gro up e d to g e th e r in to categories a cco rd in g to th e ir roles in th e a p p lic a tio n d o m a in and th is is m o d e lle d in the concep tu a l m o d el by collection schemes w hich describe the co lle ctio n s o f values representing these categories. F in a lly , the categories are re la te d to each o th e r b o th th ro u g h th e ir p a rtic ip a tio n in classification stru c tu re s , and, in te rm s o f the re la tio n s h ip s th a t may e x is t between th e ir members; th is is described in the conce p tu a l m odel as co n stra in ts on co lle ction s. T h is s tru c tu rin g o f the basic, n otions o f co nce p tu a l m odels is illu s tra te d in fig u re 1.1.
T h e a p p lic a tio n re a lity contains person e n titie s and house e n titie s . These e n titie s arc' grouped in to categories o f persons and homes, respectively. T hen we wish to represent the relationships between persons and the homes th e y liv e in.
4
C H A P T E R 1. I N T R O D U C T I O N 9
A pplication Reality
entities
frr
categoriesfit
relationshipsmodelled by represented by
Conceptual Model
Database
types ^ described by values
person house
person fred, jo h n , m ary house brick, straw
collection schemes collections
Persons : set o f person
Hom es : set o f houses Persons (fre d ,jo h n ,m a ry}
Live s : set o f [person,houses] ►
instantiated by Homes {b ric k , straw } constraints
Lives <->Persons to Homes
Lives 4 { (fre d ,b rick), (john,straw ), (m ary,straw ) }
CHAPTER. I. I N T R O D U C T I O N
10
Then the database is an in s ta n tia tio n o f the conceptual m odel in th a t i t represents p a rtic u la r e n titie s o f the a p p lic a tio n re a lity , th e ir roles and th e re la tio n s h ip s between the m . Here it is assumed th a t a person e n tity is represented by a sim p le nam e value e.g. jo h n , and s im ila rly , fo r a house e n tity e.g. brick. T h e n th e c o lle c tio n Persons is the set {fred,jo h n ,mary), and the co lle ction Homes is the set {brick,straw). T he re la tio n s h ip between persons a.nd the homes the y liv e in is represented by co lle ctio n Lives w hich is a m a p p in g fro m Persons to Homes. For exam ple, (fred,brick) is a m em ber o f Lives and th is represents the fa c t th a t fred lives in the b ric k house. For a. given database to be an instance o f a conceptual m o d el, the collections o f the database m ust be as specified in the c o lle c tio n schemes and, m u s t sa tisfy the co n stra in ts of the m odel.
T h e conceptual m odel is co nstru cte d by the database designer and is th e basis fo r the use o f the a p p lic a tio n system . T h e database designer specifies the co nce p tu a l m odel in term s o f a conceptual m o d e llin g language. To assist the database designer in th e ir task and also to ensure th a t the re s u ltin g conceptual m odel is u nd e rstoo d by the users, th is language m ust p ro vide a. num be r o f general co n stru cts th a t arc adequate for the d e s c rip tio n o f da.ta,-intensive a p p lic a tio n system s. T h e key to m a k in g such a language e ffective is th a t it should be sim ple: th is means th a t it should be based on a sm all num be r o f easily understood concepts a.nd should be o rth o g o n a l in the a p p lic a tio n o f these concepts.
Since the conceptual m odel imposes a. s tru c tu re on the values o f the database, the conceptual m o d e llin g language is c o m m o n ly referred to as a d a ta m o d e llin g language - and the co n stru cts on w hich it is based a.s a ‘ d a ta m o d e l’ . T h u s, a data, m o d el could be regarded as a database designer’s to o lk it in th a t it provides the basic com ponents for c o n s tru c tin g a conceptual m odel. T h e te rm ‘ d a ta m o d e l’ is som ew hat confusing since it is not a m odel b u t ra th e r a th e o ry fo r w h ich m odels m ay be co nstru cte d. However, the te rm is in com m on usage and therefore w ill be adopted here. B u t we w ould lik e to emphasise the d is tin c tio n th a t w ill be used between th e te rm s ‘ d a ta m o d e l’ and ‘ conceptual m o d e l’ . Here, we use ‘ d a ta m o d e l’ to m ean the set o f basic constructs and the te rm ‘ conceptual m o d e l’ to mean the m odel o f a p a r tic u la r a p p lic a tio n re a lity th a t is expressed in term s o f these constructs. T h e conceptual m odel o f a database is often referred as the da.taba,se schema.
C H A P T 1:1 II I. I N T R O D U C T I O N
11
1 .1 .3
E fficient D a t a M a n a g e m e n t
A n a p p lic a tio n system could be designed fro m scratch and th e re b y ta ilo re d fo r the in ten d ed a p p lic a tio n s to o b ta in o p tim a l perform ance. How ever, in general, th is w o u ld be a very expensive s o lu tio n and, in th e long te rm , m ay prove v e ry in e ffic ie n t a,s i t does not cater fo r system e v o lu tio n . A database system w ill evolve over tim e in term s o f its use, its re q u ire m e n ts, its stru c tu re s and its values. T h is means th a t instead o f ‘ h a rd w ir in g ’ a. system to specific database and a p p lic a tio n chara.cteristics, i t is desirable th a t the system is ada p ta b le to change in such a. way th a t good p e rfo rm a n ce is s t ill a tta in e d .
A database may co nta in very large colle ction s o f values and these m ay be p a rt o f a. com plex o verall s tru c tu re . Indeed, the va.lues w hich represent in d iv id u a l e n titie s o f the a p p lic a tio n dom ain m ay them selves be large a n d /o r co m p lex. T h e size and c o m p le x ity o f the database can re su lt in very high costs fo r re trie v a l o perations. To be borne in m in d is the fa c t th a t a. so-called re trie v a l o p e ra tio n m ay n o t be a sim p le lo o k -u p o p e ra tio n - b u t m ay in vo lve co m p lex processing o p e ra tio n s on values in the database. As databases are in c re a s in g ly becom ing an in te g ra l p a rt o f co m p lex a p p lic a tio n system s, such as s c ie n tific processing and design system s, such processing operations can be expensive b o th in space and tim e .
U p d ate o p e ra tio ns m a y also be expensive n o t o n ly in term s o f the tim e to lo ca te va.lues re q u ire d and any processing costs, b u t also, in term s o f e nsu rin g th a t the o verall consistency o f the database is m a in ta in e d . T h u s, an u p d a te o p e ra tio n m a y in c u r high overheads such as c o n s tra in t checking a c tiv itie s .
I f the system takes several m in u tes (or even hours) to p e rfo rm the required o p e ra tio ns, then e ffe c tiv e ly the system m ay be unusable. T h is is p a r tic u la rly tru e in tra n s a c tio n processing system s, such as a irlin e reservation systems, th a t ty p ic a lly in vo lve a. large num ber ol sm all tra nsa ctio ns w ith a. low c ritic a l response tim e .
Different, a p p lic a tio n s m ay have c o n flic tin g requirem ents. T h us the loca.tion a n d /o r representation o f co lle ction s o f values suited to one a p p lic a tio n m ay d iffe r fro m th a t a p p ro p ria te to a n o th e r a p p lic a tio n . I t m ay be possible to m eet the re q u ire m e n ts o f both a p p lic a tio n s th ro u g h re p lic a tio n o f d a ta or m u ltip le re presentations b u t th is then intro du ce s a d d itio n a l overheads in ensuring the consistency o f the database. I t is im p o rta n t to re m e m b er then th a t the o b je c tiv e o f the person responsible fo r the system m ust be the overall e fficien cy o f the system ra th e r than the o p tim a l s o lu tio n w ith respect i.o any one a p p lic a tio n .
C HA P TEH I. I N T R O D U C T I O N
12
reasonable standards o f efficiency regardless o f specific d a ta and a p p lic a tio n charac te ris tic s . T h is can be done th ro u g h the p ro visio n o f s u p p o rt fo r a s m a ll n u m b e r o f general co nstru cts and various im p le m e n ta tio n s o f these co n stru cts to s u it da.ta and a p p lic a tio n ch a ra cte ristics. F u rth e r, o p e ra tio ns on a database should be specified a t a lo g ica l level w h ic h is in d ep e nd e nt o f p hysica l re p re sen tatio n and im p le m e n ta tio n : th e system can the n d e te rm in e th e m ethods o f e valua tio n th ro u g h c o n s id e ra tio n o f th e d a ta ch a ra cte ristics and the c u rre n t form s o f repre sen tatio n . Hence, as th e database system evolves, the u n d e rly in g representations and im p le m e n ta tio n s can be evolved in tu rn w ith o u t recourse to the a p p lic a tio n p ro gram m ers o r end users.
T h e req uirem en ts fo r e fficien t d a ta m anagem ent are n o t o rth o g o n a l to those fo r ex pedient data m anagem ent. B oth require the notio n o f a data m odel w h ich pro vide s a sm all n um ber o f constru cts in term s o f w hich one can m odel the a p p lic a tio n dom ain w ith re la tiv e ease. A lso, the y both require the notio n o f a que ry language w hich enables the user to specify o perations on the database at a logical level in te rm s o f the conceptual m odel and indep e nd e nt o f physical representation.
A given co lle ction o f values m ay have a num ber o f possible repre sen tatio n s. T h e m ost a p p ro p ria te representation w ill depend on the size o f the c o lle c tio n in term s o f the num ber o f values it contains, the ch aracte ristics o f the m e m b e r values and the envisaged o pe ra tio n ch a ra cte ristics. I f a co lle ctio n contains o n ly a few m em bers, then a very sim p le representation such as a lin ke d lis t o f values m ay suffice. I f however a c o lle c tio n is large th e n a lin e a r search to re trie v e a p a rtic u la r value w o u ld be unacceptable and therefore some fo rm o f in d ex s tru c tu re w o u ld be m a in ta in e d .
T y p ic a l in d ex s tru c tu re s e m p lo y va ria n ts o f hashing techniques or B -tre e s (o r some c o m b in a tio n o f these). A n in d e x s tru c tu re is b u ilt over some p ro p e rty o r co m b in a tio n o f p ro p e rtie s o f the values o f the c o lle c tio n depending on access p a tte rn s . For exam ple, in a c o lle c tio n o f person o bjects, then i f there are lots o f selection opera tio ns based on th e surnam e o f a person then it is reasonable to c o n s tru c t an in d ex o f the co lle ction based on the surnam e values. T h e choice o f in d ex s tru c tu re can la ke in to consideration the s ta b ility o f the co lle ction and also the c h a ra c te ris tic s o f the properties over w hich an index is constru cte d. For exam ple, some fo rm s o f in d ex s tru c tu re are easily expanded as the size o f a co lle c tio n grows, whereas oth e rs invoke sig n ific a n t overheads if the v a ria tio n in the size o f the co lle ctio n is high. F u rth e r, the d is trib u tio n o f the p ro p e rty va.lues is s ig n ific a n t in the selection o f an in d e x s tru c tu re .
It is, o f course, possible to m a in ta in several index s tru c tu re s over a single c o lle ctio n . B u t it is im p o rta n t to rem em ber th a t the in cu rre d overheads o f m a in ta in in g an index s tru c tu re can be high as the the index w ill have to be updated as values are updated. For th is reason, cle a rly it is preferred to m a in ta in index s tru ctu re s over re la tiv e ly stable properties.
CHAPTER, 1. I N T R O D U C T I O N
13
in d ex s tru c tu re over a. given co lle ctio n m ay be made e ith e r by th e database system a d m in is tr a to r, who has overall re s p o n s ib ility fo r th e system , o r by th e system itse lf. It is possible fo r a system to g ath er s ta tis tic s on access p a tte rn s and, on the basis o f these, decide when to c o n s tru c t a new in d e x s tru c tu re - o r delete an e x is tin g one. A t present, such system s are rare; however, in a n u m b e r o f system s an in d e x m ay be co n stru cte d p u re ly for the e valuation o f a p a rtic u la r query.
T h e m e th o d o f e va lu a tio n o f a p a rtic u la r o p e ra tio n on a c o lle c tio n , o r set o f collec tio ns, depends upon the representation o f those collections a.nd also th e ch a ra cte ristics o f those co lle ction s. For exam ple, i f an o pe ra tio n is to select specific o b je c ts fro m a c o lle c tio n based on the values o f some p ro p e rty o f those o b je cts, the n th e m ethod o f eva lu a tio n depends upon w h e th e r or not th a t c o lle c tio n has an in d e x s tru c tu re on th a t p ro p e rty. T h e size o f a co lle ction is also s ig n ific a n t in selecting the im p le m e n ta tio n o f an o p e ra tio n .
A given q u e ry on a database is specified in a query language in te rm s o f the con ce p tu a l m odel. T h is query can be tra n s la te d in to an a lgebraic expression in term s o f o p e ra tio n s on collections. Then the selection o f an e va lu a tio n plan fo r the query expression consists o f tw o stages. F irs t, the query expression is tra n s fo rm e d in to an e q u iva le n t expression based on the algebraic p ro p e rtie s o f the o pe ra tio ns. T h is stage is g e n e ra lly know n as logical o p tim is a tio n . T hen an e v a lua tio n plan is constructed for the tra.nsformed query expression a.nd th is plan takes in to account the physical re p re se n ta tio n o f the colle ction s and a.lso th e ir c h a ra cte ristics. T h is stage is know n as physical o p tim is a tio n .
T h e issues o f q ue ry o p tim is a tio n a,re w ell und e rstoo d in th e area o f re la tio n a l database systems. These were the firs t systems to m ake a clear sepa ra tio n between the logical and physical levels o f a database system such th a t th e end-users in te ra c t w ith the da.taba.se o n ly in term s o f its conceptual m odel (o r view s th e re o f) and the system deals w ith th e tra n s la tio n o f operations specified a t the user level in to ope ra tio ns at the physical level.
CHAPTER. 1. I N T R O D U C T I O N
14
1.2
T h e C o lle c ti o n M o d e l
From the foregoing discussion on database system re q u ire m e n ts, i t should be clear th a t w h ile a general p ersisten t p ro g ra m m in g language su p p o rts e ffe ctua l d a ta m an agem ent, it does n ot s u p p o rt a ll o f the features fo r e x p e d ie n t and e ffic ie n t d a ta m anagem ent required by d a ta -in te n s iv e a p p lica tio n s. A p e rsiste n t p ro g ra m m in g sys tem provides su p p o rt fo r the persistence o f in d iv id u a l values, b u t it does n o t pro vide e x p lic it s u p p o rt fo r the n o tio n o f a. database as a re p re se n ta tio n o f an a p p lic a tio n dom ain in term s o f in te rre la te d collections o f values. In o th e r words, the persistent system has no n otio n o f a. data, m odel as discussed in th e section on e xpe d ie nt data m anagem ent, and there is no d is tin c tio n between the logical and physical levels o f representation, (d e arly, as they have, no n o tio n o f a databa.se, the y also have no notio n o f a query language th a t expresses ope ra tio ns on a database.
I t is im p o rta n t to emphasise th a t a lth o u g h some p e rsiste n t system s have been ex tended to s u p p o rt collections, th is s till falls sh o rt o f o u r database system re quirem ents as there is no e x p lic it s u p p o rt fo r expressing the sorts o f conce p tu a l dependencies am ong these collections re q u ire d to m odel cla ssifica tio n s tru c tu re s and re la tion sh ip s. F u rth e r, it is im p o rta n t to emphasise th a t we are n o t p re sen tin g these as deficiencies o f persistent systems b u t ra th e r are h ig h lig h tin g the d is tin c tio n between database system s and persistent systems. A database system is based on a p e rsiste n t system - b u t it has a d d itio n a l fa c ilitie s a p p ro p ria te fo r th e s u p p o rt o f d a ta -in te n s iv e a p p lica tions. Indeed, a p ersistent p ro g ra m m in g language extended to s u p p o rt th e n otio n s o f a. d a ta m odel and query language is a database p ro g ra m m in g language. For exam ple, the database p ro g ra m m in g language G a lile o [A C 0 8 5 ] is a p e rsiste n t p ro g ra m m in g language w ith sem antic data, m odel features a.nd a b s tra c tio n m echanism s designed to s u p p o rt database a p p lic a tio n p ro g ra m m in g .
In th is thesis, we address the questions o f w h a t these a d d itio n a l fa c ilitie s should be and how they should be provided in the c o n te x t o f o b je c t-o rie n te d systems. T h e p ro posed c o lle ctio n model is intended a.s a general m odel on w h ich to ba.se the pro visio n o f d a ta m anagem ent s u p p o rt in an o b je c t-o rie n te d system . A lth o u g h the m odel was developed in the co n te xt o f a p a rtic u la r o b je c t-o rie n te d p la tfo rm , the general m odel is independent o f the u n d e rly in g p ersisten t o b je c t system . In th is way, th e reported w o rk differs fro m th a t o f database p ro g ra m m in g languages such as G a lilie o in th a t the data m odel is not t ig h tly in te g ra te d w ith a specific p ro g ra m m in g language.
C U A P T t i l l I. I N T I I O D U C T I O N
15
We separate o u t these tw o parts o f the co lle ctio n m odel to emphasise th a t w h ile the o p e ra tio n a l m odel is dependent on the s tru c tu ra l m o d el, the s tru c tu ra l m o d el can be s u p p o rte d in d e p e n d e n tly o f the o p e ra tio n a l m odel. W h ile we advocate th e extensive use o f h ig h -le v e l queries in a p p lic a tio n p ro g ra m m in g , th e a p p lic a tio n p ro g ra m m e r can choose to a do p t th e structura.I m o d el as a means o f m o d e llin g th e ir a p p lic a tio n d o m a in and the n use basic ite ra to rs and n a v ig a tio n a l techniques to im p le m e n t th e ir a p p lic a tio n s d ire c tly . T h is incurs p enalties in term s o f s u p p o rtin g system e v o lu tio n in th a t the a p p lic a tio n code is then dependent on the p hysical re p re se n ta tio n and m ay n o t be able to take advantage o f new in d e x s tru c tu re s . H ow ever, th is m ay be a p p ro p ria te to c e rta in a p p lic a tio n s and c e rta in ly the use o f the s tru c tu ra l m odel alone is s till beneficial.
'The basic c o n c e p t s s u p p o r t e d in the s tru c tu ra l m odel are e n tity categories, re la tio n ships between e n titie s a.nd rich classification s tru c tu re s both o f e n tity categories and re la tion sh ip s. A n e n tity category is represented by a. co lle ctio n o f a to m ic values. These a to m ic va.lues m ay be any values supported by the u n d e rly in g ty p e system and, in the case o f o bjects, these w ill be o b je c t references. E n tity re la tio n sh ip s fo rm relations between e n tity categories and these are represented by co lle ctio n s o f pairs o f a to m ic values. T h us a re la tio n s h ip between tw o o b je cts w ill be represented by a
pair consisting o f the references o f those objects.
B o th e n tity categories and re la tio n s between categories can be p a rt o f cla ssifica tion stru c tu re s . These c la ssifica tio n s tru c tu re s a llo w e n titie s to be considered as belonging to d iffe re n t roles in the a p p lic a tio n . For exam ple, a person e n tity m ig h t a t one tim e be considered as a s ta ff e n tity , a t a n o th e r tim e as a le c tu re r e n tity , and a t y e t ano the r tim e as a. ten n is player. T h e re are conceptual dependencies between these roles to in d ic a te , fo r e xam ple , th a t le c tu re r is a specialised role o f s ta ff and th e re fo re every le c tu re r e n tity is also a s ta ff e n tity . In a. s im ila r way, re la tio n s can also be specialised. For exam ple, given re la tio n sh ip s between persons and th e ir associated d e p a rtm e n ts, then the re la tio n s h ip s between s ta ff and the d e p a rtm e n ts w h ich e m p lo y them w ould be a specialisation o f the m ore general association.
'The structura.I m odel was given the nam e B R O O M (B in a ry R e la tio n a l O b je c t- O riented M o d e l) to emphasise the im p o rta n c e o f s u p p o rt fo r the d ire c t representa tio n o f re la tio n sh ip s between e n titie s . A lth o u g h very sim p le , the e n tity -re la .tio n s h ip model [Che76] has proved very p o p u la r in the m o d e llin g o f the s tru c tu ra l properties o f a p p lic a tio n dom ains. T h e basic concepts in th is m odel are e n tity categories and relations!)! ps.
C H A P T i m I. I N T R O D U C T I O N
16
m ethods o f these o b je c ts and therefore are e ffe c tiv e ly b u rie d w ith in o b je cts, the overall s tru c tu re o f the a p p lic a tio n d o m a in is n o t re a d ily a pp a re nt. B y decom posing re la tio n sh ip s in th is way. we cannot handle a. re la tio n s h ip as a. single logical u n it. As R um baugh states:
“ ... it is not possible to separate the a b s tra c tio n fro m the im p le m e n ta tio n w ith the same c la rity a.s the re la tio n a l m o d e l.”
F u rth e r, in the design o f large system s, re la tio n s h ip s have been shown to be a useful a b stra ctio n m echanism fo r p a rtitio n in g system s in to subsystem s. R ecently, there ha.ve been a num be r o f proposals for some form o f extension to o b je c t-o rie n te d models to su p p o rt relai ionships as first-class objects.
T h e sem antic d a ta m odels [H K S7], [PMSS] m ig h t be considered as a developm ent o f the e n tity -re la tio n s h ip m odels th a t s u p p o rt c la s s ific a tio n s tru c tu re s based on is a
re la tion sh ip s between e n tity categories. Since one o f th e fu n d a m e n ta l concepts o f o b je c t-o rie n te d d a ta m odels is th a t o f s u b ty p in g and in h e rita n c e , these are often considered to s u p p o rt cla ssifica tio n stru c tu re s . H ow ever, th e y ofte n o m it s u p p o rt fo r the rich conceptual dependencies th a t can arise in cla ssifica tion s tru c tu re s - such as categories p a r titio n in g o th e r categories, the fact th a t c e rta in categories are m u tu a lly exclusive and also the idea o f s u p p o rtin g a lte rn a tiv e c la ssifica tio n views. These have been in co rp o ra te d in to the B R O O M m odel th ro u g h the concept o f c o lle c tio n fam ilies. T he o p e ra tio n a l m odel is based on an algebra o f co lle ctio n s. T h is m irro rs the re la tio n a l algebra, w h ich was fu n d a m e n ta l to the success o f th e re la tio n a l m odel. Its success was due to its s im p lic ity , u n ifo r m ity and h ig h -le v e l q ue ry languages s te m m in g fro m the in tro d u c tio n o f the single generic c o lle c tio n ty p e - the re la tio n . T h e basis fo r its hig h-leve l q ue ry languages was an algebra o f o pe ra tio n s on these collections as opposed to the n o tio n o f ope ra tio ns on in d iv id u a l d a ta records th a t had u nder pinned the n e tw o rk and h ie ra rc h ic a l d a ta m odels. U n fo rtu n a te ly , the draw back o f the re la tio n a l m odel is th a t it is ju s t too sim p le and lacks se m a ntic s tru c tu re .
A lth o u g h , there have been some proposals fo r an a lgebra w h ic h operates on collec tions o f o b je cts, a. n u m b e r o f o b je c t-o rie n te d database system s use o b je c t at a tim e processing and have the re b y lost the advances o f th e re la tio n a l m odel in term s o f its high-level q ue ry processing. A c o lle c tio n algebra, can fo rm the basis o f high-level query languages for o b je c t-o rie n te d system s and, im p o rta n tly , the o p tim is a tio n o f query expressions and query ('va lu a tio n strategies.
In e fle c l. the operationa.I m odel also su pp o rts o p e ra tio n s on a dal abase in th a t an operation in v o lv in g one co lle ctio n can generate o p e ra tio n s on o th e r collections as det erm ined by I he concept ual dependencies am ong co lle ctio n s. For exam ple , deletin g
C H A P T E R 1. I N T R O D U C T I O N 17
co lle ction s w hich are dependent on th a t co lle c tio n . T h u s, i f we were to delete a p a rtic u la r s tu d e n t o b je c t fro m the c o lle c tio n P e rs o n s th e n we w o u ld also have to delete i t fro m the c o lle c tio n S tu d e n ts i f there is a dependency th a t every m e m b e r o f S tu d e n ts is also a m em b e r o f P e rs o n s .
In s u m m a ry, the co lle ctio n m odel presented in th is thesis in co rpo ra te s m a n y o f the favourable features o f the re la tio n a l, e n tity -re la tio n s h ip and sem antic d a ta m odels. I t has d ire c t s u p p o rt fo r th e re p re sen tatio n o f re la tio n sh ip s; i t su pp o rts ric h cla ssifica tio n s tru c tu re s ; and it ha.s an o pe ra tio n a l m odel based on an algebra o f co lle ction s.
1.3
S t r u c t u r e o f T h e s is
T h is thesis presents a general d a ta m odel w h ich m a y fo rm a fo u n d a tio n fo r the d evelopm ent o f d a ta m anagem ent services in o b je c t-o rie n te d system s. I t assumes as a p la tfo rm any re lia b le , p ersisten t o b je c t store and considers th e p ro v is io n o f e xpe d ie nt d a ta m anagem ent th ro u g h a s tru c tu ra l and an o p e ra tio n a l m odel th a t tog e the r fo rm th e p a rtic u la r d a ta m odel referred to as th e c o lle c tio n m o d el. T h is model can then be used as a basis fo r e fficie n t d a ta m anagem ent by means o f m u ltip le physical re p re s e n ta tio n s tru c tu re s and query o p tim is a tio n techniques.
Hence, the focus o f th is w ork is on the d a ta m o d e llin g aspects o f o b je c t-o rie n te d database system s. In p a rtic u la r, there is an a tte m p t to redress the a p p a re n t im balance in m any proposed o b je c t-o rie n te d database system s where the em phasis has been on effectual and e ffic ie n t d a ta m anagem ent and the issue o f e xpe d ie ncy has been som ew hat neglected. As a re s u lt, m any o f the e x is tin g system s p ro v id e li t t le su p p o rt for the concepts th a t have becom e ce n tra l to the w o rk on d a ta m o d e llin g . F u rth e r, by o m ittin g s u p p o rt fo r the h igher-level da.tabase s tru c tu re s o f data, m odels, the a tte n tio n to e fficien cy addresses a.ccess to in d iv id u a l o b je cts or single co lle ctio n s o f objects. By s u p p o rtin g da.ta.ba.se s tru c tu re s in v o lv in g m u ltip le co lle ctio n s o f o bjects, a.nd o p e ra tio ns on these co lle ction s, o p tim is a tio n s can be m ade at a h igher-level w hich means th a t th e y ten d to be m ore g lobal and less localised. A t th is h ig h e r-le ve l, the o p tim is a tio n techniques are b e tte r able to u tilis e se m a ntic in fo rm a tio n o f the a p p lic a tio n d o m a in .
We begin in C h a p te r 2 w ith a discussion on the fo u n d a tio n s o f d a ta m odels. A d a ta model d ete rm in e s the basic co n stru cts available fo r the c o n s tru c tio n o f conceptual models o f a p p lic a tio n dom ains. We therefore e xam ine the general p h ilo so p h ica l foun dations o f conceptual m o d e llin g as a. basis for d e te rm in in g the basic re q uirem en ts o f data models. From these requirem en ts, we present a general fra m e w o rk in which Io consider 1 he m ain c h a ra cte ristics o f the various categories o f data, m odels. T he
( ' U A P ' l K l l
/.
I N T H O D l i C T I O N18
C h a p te r 3 deals w ith the s tru c tu ra l aspects o f the c o lle c tio n m odel. T h e specifica tio n o f the B R O O M m o d el is presented in fo u r stages. F irs tly , there is an in fo rm a l o ve rvie w w h ic h describes the m a in features o f the m o d el and looks a t some s im ple exam ples. N e x t, th e fu n d a m e n ta l concepts on w h ic h the m o d el is b u ilt nam ely, co lle ctio n s and c o lle c tio n fa m ilie s , are presented in d e ta il. T h is is fo llo w e d by a m e ta c irc u la r d e s c rip tio n o f the B R O O M m odel in w h ich the m odel is described in term s o f its e lf. T h is d e s c rip tio n is used as an in te rm e d ia te stage o f sp e cifica tio n w h ic h is refined in to a fo rm a l s p e cifica tio n in th e language Z [Spi89], [D il9 0 ], [P S T 91]. Such a m e ta -c irc u la r d e s c rip tio n is also useful b o th as a d o c u m e n ta tio n aid fo r the m odel and as a. basis fo r s u p p o rtin g the u n ifo rm tre a tm e n t o f d a ta and m e ta d a ta .
T h e sem antic m o d e llin g c a p a b ilitie s o f the B R O O M m odel are e xam ined in C h a p te r -I. Rust, the s u p p o rt for re I a.) ionships is discussed in d e ta il. T h e n each o f 1 he sem antic data m o d e llin g a bstra ctio ns referred to a.s aggregation, generalisa tio n and association is (examined w ith exam ples to d e m o n stra te how these w ould be represented in the B R O O M m odel.
T h e o p e ra tio n a l aspects o f the co lle ctio n m odel are presented in C h a p te r 5. T h re e levels o f o pe ra tio n are possible and the ch a p te r begins w ith an e x a m in a tio n o f these levels. T h e m ain them e o f the ch a p te r is the p resentation o f a c o lle c tio n algebra w h ich deals w ith o p e ra tio n s on collections. T h e p ro pe rtie s o f th e algebra are presented and a discussion o f how the associated algebraic tra n s fo rm a tio n s could be used in query o p tim is a tio n .
A database is n o t a s ta tic e n tity b u t ra th e r is d y n a m ic in th a t i t evolves over tim e . T h e e n titie s represented w ill change and also the form s o f th e ir re presentations m a y change as e n titie s a do p t d iffe re n t roles th ro u g h o u t th e ir life tim e . In a d d itio n , th e s tru c tu re o f the database m ay evolve e ith e r to re fle ct changes in the real w o rld system s th a t they model or because o f changes to the req uirem en ts o f the database system . In C h a p te r 6. we discuss the various form s o f clata.ba.se e v o lu tio n and how these can be su pp o rted . In p a rtic u la r, we propose an extension to the c o lle c tio n m odel to su p p o rt o b je c t e v o lu tio n .
The co lle ctio n m odel was developed w ith in the Com andos p ro je c t [C B H d P 9 3 j. Co- mandos is an E s p rit p ro je c t concerned w ith the c o n s tru c tio n and m anagem ent o f d is trib u te d open system s. In C h a p te r 7, we describe how the c o lle c tio n m odel was realised as part, o f a. Com andos system . T h e c o lle c tio n m o d el was designed as a gen eral m odel and is not specific to the Com andos system . To illu s tra te th is p o in t, we also describe a p ro to ty p e o b je c t da.ta m anagem ent system , C O L L E E N , w hich was based on the c o lle c tio n m odel and im p le m e n te d in M a c P ro lo g [LP A 91].
C h a p ter 2
F o u n d a tio n s o f D a ta M o d e ls
T h e co lle ctio n m odel proposed in th is thesis is a p a rtic u la r d a ta m odel w h ich p rim a rily was designed to s u p p o rt data, m anagem ent in o b je c t-o rie n te d system s. Before going on to present th is m odel, we firs t consider in some d e ta il e x a c tly w h a t a. d a ta m odel is and what, its requirem ents are, b o th in general, and also in the specific co n te x t o f o b je c t-o rie n te d systems.
A d a ta m odel su pp o rts the c o n s tru c tio n o f a m o d el o f a d a ta -in te n s iv e a p p lic a tio n system w ith the in te n tio n o f representing th a t a p p lic a tio n d o m a in by means o f a database system . T h e process o f c o n s tru c tin g an a p p lic a tio n m o d el using a p a rtic u la r d a ta m odel is referred to as d a ta m o d e llin g . W e te rm the c o n s tru c te d m odel a conceptual m o d el o f the a p p lic a tio n d om ain . Such a co nce p tu a l m o d el should be
adequate in th a t i t should ca p tu re th e relevant features o f the a p p lic a tio n d om ain ,
and, fu rth e rm o re , i t should be n a tu ra l in th a t i t should correspond to the sorts o f m e n ta l m odels th a t users c o n s tru c t fo r m e n ta l processing.
T h e general area, o f s tu d y concerned w ith the c o n s tru c tio n o f m odels w h ich correspond d ire c tly and n a tu ra lly to o u r own co n ce p tu a lisa tio ns o f re a lity is know n as e ith e r conceptual m o d e llin g or c o g n itiv e m o d e llin g . T h e process o f d a ta m o d e llin g is a special case o f conceptual m o d e llin g and it follow s th a t th e fo u n d a tio n s o f conceptual m o d e llin g are an im p o rta n t s ta rtin g p o in t in an a tte m p t to d e te rm in e the general req uirem en ts o f da.ta. models. T h erefo re , th is c h a p te r begins w ith an e x a m in a tio n o f some o f the philosophical fo u n da tio ns o f conceptual m o d e llin g th a t are p a rtic u la rly p e rtine n t to databa.se system s and give some in s ig h t in to th e u n d e rly in g basis fo r the proposed co lle ctio n model.
A ris in g from the.-.o philosophical considerations, we a rriv e at some requirem ents for data models th a t in tu rn form the basis of a general fra m e w o rk for da.ta, models. We present this fram ew ork in section 2.2 and then go on to consider the various categories of e x is tin g data models in term s of th is fra m e w o rk in section 2.3.