Terms and Conditions of Use of Digitised Theses from Trinity College Library Dublin
Copyright statement
All material supplied by Trinity College Library is protected by copyright (under the Copyright and Related Rights Act, 2000 as amended) and other relevant Intellectual Property Rights. By accessing and using a Digitised Thesis from Trinity College Library you acknowledge that all Intellectual Property Rights in any Works supplied are the sole and exclusive property of the copyright and/or other I PR holder. Specific copyright holders may not be explicitly identified. Use of materials from other sources within a thesis should not be construed as a claim over them.
A non-exclusive, non-transferable licence is hereby granted to those using or reproducing, in whole or in part, the material for valid purposes, providing the copyright owners are acknowledged using the normal conventions. Where specific permission to use material is required, this is identified and such permission must be sought from the copyright holder or agency cited.
Liability statement
By using a Digitised Thesis, I accept that Trinity College Dublin bears no legal responsibility for the accuracy, legality or comprehensiveness of materials contained within the thesis, and that Trinity College Dublin accepts no liability for indirect, consequential, or incidental, damages or losses arising from use of the thesis for whatever reason. Information located in a thesis may be subject to specific use constraints, details of which may not be explicitly described. It is the responsibility of potential and actual users to be aware of such constraints and to abide by them. By making use of material from a digitised thesis, you accept these copyright and disclaimer provisions. Where it is brought to the attention of Trinity College Library that there may be a breach of copyright or other restraint, it is the policy to withdraw or take down access to a thesis while the issue is being resolved.
Access Agreement
By using a Digitised Thesis from Trinity College Library you are bound by the following Terms & Conditions. Please read them carefully.
A F lex ib le Fram ew ork for D istr ib u te d Shared O b je c ts
Stefan W eber
A thesis su b m itte d to tlie U niversity of D ublin, T rinity College
in fulfillm ent of th e requirem ents for the degree of
D octor of Philosophy (C ouij)uter Science)
1 0 OCT ^'002
i^ ^ U B ^ Y D U B L l^
D ecla ra tio n
I, the undersigned, declare th a t this work has not previously been su b m itte d to th is or any
other University, and th a t unless otherw ise s ta te d , it is entirely my own work.
Stefan Weber
I, the undersigned, agree th at Trinity College Library may lend or copy this thesis upon
request.
Stefan Weber
A cknow ledgem ents
F irst and forem ost, th an k s are due to my two supervisors B rendan Tangney and P a d d y Nixon for their endless efforts and infinite patience while supervising me. I’m still surprised th a t B re n d a n ’s h air did not tu rn gray because of th e rep e a te d delays th a t my difficulties w ith the w riting of th is te x t caused.
T h an k s are also due to V inny Cahill, for his h ard work to keep our group ru n n in g and to provide us w ith funding for equijnnent and travels; w ithout his continuous efforts the D istrib u ted System s G roup w ouldn’t be th e same.
I am very grateful to In m aculada A rnedillo Sanchez and C h ristian Jensen, who - w hen I was ab o u t to give uj) in th e frequent m om ents of self-doubt - set me on th e right p a th again.
And finally th an k s are due to all m em bers of DSC for th e ir company, th eir p atience w ith me and th eir good advice: Stei)hen, Frank, M ads, Jim , T im , T ilm an, Ray, Rene, P eter, Donal, Jo h n , Jo h an , and m any more. “Stick to g eth er team !”
At last b u t not least, I ’m very grateful to my fam ily for th eir co n stan t su p p o rt and encouragem ent du rin g th e years. T his thesis is dedicated to them .
S tefan W eb er
A d istrib u te d shared m em ory (DSM) system ensures the consistency of sh ared d a ta in a d istrib u ted system while providing the program m ing paradigm of a single-processor system . Recent DSM system s provide increasingly m ore su p p o rt for th e sharing of objects ra th e r th an portions of memory. However, like earlier DSM system s these d istrib u te d shared object (DSO) system s still rely upon a single protocol, or a sm all set of given protocols, for the sharing of applicatio n objects. T h is lim itation prevents th e applications from optim izing the underlying com m unication behaviour on a per-ap p licatio n basis. T his resu lts in unnecessary overhead which im pacts upon perform ance which is th e m ain goal of parallelism .
A cu rren t tre n d in software developm ent is tow ard custom izable system s; for exam ple fram eworks, reflection, and aspect-oriented program m ing all aim to give th e developer greater flexibility and control over the functionality and perform ance of th eir code. T h e lack of stru c tu re in protocols used in DSM system s prevents th e in tro d u ctio n of this kind of a d a p ta b ility into DSM system s.
T his thesis categorizes DSM protocols into consistency m odels and coherency protocols. C onsistency m odels define the order in which accesses to shared d a ta are experienced by nodes of a parallel system . Coherency protocols describe the m ethod by w hich resu lts of accesses to shared m em ory are exchange betw een nodes.
description is novel to our knowledge.
T he definition of th is relationship allows the com bination of consistency m odels and co herency protocols in a tru ly com ponent-based fashion. It allows a consistency m odel or coherency protocol to be created from subcom ponents and be a d ju sted to changing environ m ents by ad ju stin g the sub-com ponents of th e com ponents. T h is is th e fu n d am en tal insight th a t allows us to build system s th a t can be a d a p te d to application-specific needs.
T he definition of the com ponents and th eir relatio n sh ip s is tran sferred into a fram ew ork th a t is im plem ented using Java. T his fram ew ork provides the developer w ith interfaces and classes for consistency m odels and coherency protocols and th eir sub-com ponents. T h e interfaces define to which im plem entations of coherency protocols and consistency m odels m ust comply. T his m aintains th e interchangeability of com ponents of th e sam e type in the fram ework. T he classes provided by th e fram ew ork can be divided into concrete and a b stra c t classes. T h e concrete classes represent com plete im plem entations of consistency models and coherency protocols. These im plem entations can be readily used in applications as building blocks for a consistency m aintenance protocol. T he a b s tra c t classes of th e fram ew ork represent base classes th a t can be extended to create new im plem entations of consistency model and coherency protocols in order to a d ju st a consistency m aintenance protocol to api)lication-sj)ecific needs.
[1] S. Weber and P. Nixon. An Object Oriented DSM Framework. Proceedings of IEEE Frontiers’96, Annapolis, M aryland, O ctober 1996.
[2] A. Judge, P. Nixon, B. Tangney, S. Weber and V. Cahill. High Performance Cluster Computing, volume 1, chapter 17, pages 409-438. Prentice Hall, 1999.
C ontents
A c k n o w le d g e m e n ts iv
A b s tr a c t iv
L ist o f T a b le s x iii
L ist o f F ig u r e s x iv
C h a p te r 1 I n t r o d u c tio n 1
1.1 Prograniniiiig p a r a d ig m s ... 1
1.2 D istrib u ted S hared M emory s y s t e m s ... 2
1.3 A d a p ta b ility ... 4
1.4 T h e s i s ... 5
1.5 E v a lu a tio n ... 6
1.6 R oad M a p ... 7
1.7 S u m m a r y ... 7
C h a p te r 2 R e la te d w o r k 9 2.1 M otivation ... 9
2.2 A d a p ta b ility in DSM s y s t e m s ... 11
2.2.1 System a d a p t a b i l i t y ... 12
2.2.2 Protocol a d a p ta b ility ... 13
2.2.5 Dynamic a d a p t a b i l i t y ... 15
2.2.6 Summary of A daptability C a te g o rie s ... 16
2.3 Types of DSM s y s te m s ... 16
2.3.1 Hardware-assisted shared m e m o r y ... 18
2.3.2 Page-based software systems ... 23
2.3.3 Object-based systems ... 29
2.3.4 S u m m a r y ... 30
2.4 Consistency m o d e ls... 30
2.4.1 Strict consistency m o d e l s ... 32
Sequential consistency ( S C ) ... 32
Atomic consistency (AC) ... 33
Processor consistency ( P C ) ... 33
Cache consistency ( C C ) ... 34
2.4.2 Relaxed consistency m o d e ls ... 34
Weak Consistency ( W C ) ... 35
Release Consistency ( R C ) ... 36
Lazy Release Consistency ( L R C ) ... 36
2.4.3 Categorization of consistency m o d e ls ... 36
2.4.4 Summary of consistency m o d e ls... 37
2.5 Related projects ... 38
2.5.1 T rea d M a rk s... 38
2.5.2 CVM ... 39
2.5.3 M vm in... 40
2.5.4 O r c a ... 41
2.5.5 D is o m ... 42
2.5.6 R a b b it... 42
2.5.7 Summary of related projects ... 42
C h a p t e r 3 A n ew c a te g o r iz a tio n o f c o n s is te n c y m a in te n a n c e p r o to c o ls 45
3.1 Overall structure of consistency m aintenance p ro to c o ls ... 46
3.1.1 Components of Consistency Maintenance P ro to c o ls ... 47
3.2 Consistency Models ... 48
3.2.1 Properties of consistency m o d e l s ... 49
E x a m p le s ... 50
3.2.2 Prim itives in consistency m o d e l s ... 50
Description of p rim itiv es... 51
Semantics of primitives ... 52
3.2.3 C haracterization of consistency m o d e ls... 53
Strict consistency m o d e l s ... 53
Relaxed consistency m o d e ls ... 56
3.3 Coherency P r o to c o ls ... 59
3.3.1 Reaction on the receipt of p rim itiv e s ... 60
T ra n sfo rm a tio n ... 61
Distribution a l g o r i t h m s ... 61
3.3.2 Components of Coherency P r o to c o ls ... 64
Memory management ... 67
Ownership m a n a g e m e n t... 70
D istribution m a n a g e m e n t... 71
3.3.3 E x a m p le s ... 72
3.4 Complete Examples ... 72
3.4.1 T re a d M a rk s... 72
3.4.2 D A S H ... 74
3.5 S in n m a r y ... 75
C h a p te r 4 I m p le m e n ta tio n 76 4.1 Framework o v e rv ie w ... 77
4.2 Supporting c l a s s e s ... 79
Intention ... 82
4.2.2 RMI m e c h a n is m ... 83
4.2.3 Synchronization classes ... 86
L o c k s ... 86
B a r r i e r s ... 90
4.2.4 Execution environm ent ... 90
4.3 Consistency m o d e l ... 92
Generic B ase-C lasses... 92
Im plementation of a consistency m o d e ls ... 92
4.3.1 Overview of the class h ie ra r c h y ... 93
4.3.2 Weak C o n s is te n c y ... 96
4.3.3 Modified Weak Consistency ( M W C ) ... 97
4.3.4 Lazy Release Consistency ( L R C ) ... 97
4.4 Coherency p r o to c o l s ... 97
4.4.1 General behaviour of coherency p r o to c o l s ... 98
4.4.2 Memory M a n a g e m e n t... 99
Update ... 100
Example of a memory management c la s s ... 101
4.4.3 Distribution m a n a g e m e n t... 101
Example of a distribution management class ... 103
4.4.4 Ownership m a n a g e m e n t... 103
Example of an ownership management c l a s s ... 105
4.4.5 Implementation of a coherency p r o t o c o l... 105
4.4.6 Class hierarchy of coherency p r o to c o ls ... 108
4.4.7 Implemented Coherency P r o to c o l s ... 109
4.5 Deployment of a shared object ... 110
4.5.3 U tiliz a tio n ... 112
4.6 S u m m a r y ... 113
C h a p te r 5 E v a lu a tio n 114 5.1 E nvironm ent ... 116
5.1.1 System e n v ir o n m e n t ... 116
5.1.2 Software e n v i r o n m e n t ... 118
5.2 T h e Traveling S a le s p e r s o n ... 118
5.3 M atrix M ultiplication ... 120
5.4 T h e G am e of L I F E ... 123
5.5 D is c u s s io n ... 126
C h a p te r 6 C o n c lu s io n 128 6.1 C o n tr ib u tio n s ... 128
6.2 F u tu re w o r k ... 129
6.3 S i u n m a r y ... 130
2.1 Sum m ary of related p r o je c ts ... 43
3.1 D escription of T re a d M a rk s ... 73
3.2 D escription of D A S H ... 74
5.1 E valuation m e t r i c s ... 114
5.2 B est perform ing C M /C P c o m b in a tio n s ... 126
List o f Figures
2.1 G eneral m odel of shared m e m o r y ... 17
2.2 Schem atic of a bus-based DSM s y s t e m ... 19
2.3 2-diniensional m esh of a c r o s s b a r ... 20
2.4 KSR-1 A rchitecture ... 22
2.5 Sim ple exam ple of race c o n d i t i o n ... 31
2.6 H istory th a t satisfies processor c o n s is te n c y ... 33
2.7 H istory th a t satisfies cache c o n s is te n c y ... 34
2.8 C ategorization of consistency m o d e l s ... 37
3.1 C om ponents of consistency m aintenance protocols ... 48
3.2 Sem antics of p rim itiv e s ... 53
3.3 Sem antics for a seq. cons, m o d e l ... 54
3.4 Sem antics for a cache cons, m o d e l ... 55
3.5 Sem antics for a proc. cons, m odel ... 56
3.6 Sem antics for a weak cons, m o d e l... 57
3.7 Sem antics for a release cons, m o d e l... 58
3.8 Sem antics for a lazy release cons, m o d e l ... 59
3.9 C om ponents of coherency p ro to c o ls ... 65
4.1 Framework O v e rv ie w ... 78
4.2 Interface of the D escriptor c l a s s ... 80
4.5 Example of a rem ote m ethod in v o c a tio n ... 84
4.6 Interface of the Lock c l a s s ... 86
4.7 Interface of the Barrier c la s s ... 90
4.8 M aster/W orker r e la tio n s h ip ... 91
4.9 Consistency Model Class H ie ra rc h y ... 95
4.10 Interface of the Coherency Protocol c l a s s ... 98
4.11 Interface of the Memory management class ... 100
4.12 Interface of the D istribution management c l a s s ... 103
4.13 Interface of the ownership management class ... 103
4.14 Coherency Protocols Class H ie ra rc h y ... 108
4.15 Interface of the SharedObject c l a s s ... I l l 5.1 Speed-up Curve for T S P ... 121
5.2 Speed-up Curve for M atrix M u ltip lic a tio n ... 123
5.3 Communication between S t r i p s ... 124
5.4 D istribution of Conway’s Game of L i f e ... 125
C hapter 1
In trod u ction
Tlie programming of a single-processor system is a relatively simple task. A single-processor system handles one instruction at a time and stores results of operations in a local memory. All instructions see the same d ata and a read operation returns the result of the last write operation.
In contrast to single-processor systems, parallel systems process a number of tasks at the same time. These tasks exchange d ata with one another and need to be coordinated occasionally. The programm ing of these tasks is being facilitated by progranmiing paradigms tliat focus on the development of programs for parallel systems.
The two prevalent progrannning paradigms for the development of parallel systems are message j)assing and (distributed) shared memory. These two paradigm s approach the com munication between nodes very differently. Message passing is based on user-involvement in the coordination of communication efforts; distributed shared memory attem pts to be transparent to the developer.
1.1
P ro g ra m m in g paradigm s
contain only problem-specific data. The coordination of the individual programs is completely in the control of the programm er. The individual program executes on each node like a traditional program. The separation of an application into a number of individual programs makes the execution of the application easy to understand for programm ers th a t are used single node applications.
The message passing paradigm is inconvenient when programm ing applications with a high am ount of communication because all communication between nodes has to be arranged by the developer. The communication between a small set of nodes is not very complex and can be coordinated by the developer w ithout great difficulty; however, it becomes increasingly difficult with a growing num ber of nodes. In other words, message passing scales very badly towards large parallel systems.
Shared memory offers a simple programming paradigm to the developer. The commu nication and synchronization between the tasks th a t are executed on nodes of a parallel system consists of memory accesses. The developer is used to this form of communication from single-processor systems. The system is responsible to provide an image of a single shared memory and the complexity of the underlying system is completely hidden from the developer.
1.2
D istr ib u te d Shared M em ory sy ste m s
C h a p te r 1. Introdu ction
nodes.
Shared memory systems based on distributed parallel systems employ protocols th a t translate memory accesses into messages between the nodes of a system. These protocols are called consistency maintenance protocols. They ensure th a t memory accesses on one node are visible to other nodes in a predictable manner.
The maintenance of consistency of shared d ata involves a num ber of steps. T he node where an access originates needs to determ ine the memory locations th a t are involved in the access. The access may have to be synchronized w ith accesses made by other nodes in the system. This may require the exchange of synchronization messages. An access may consist of a retrieval or a modification of the value of a memory location. In case of a modification of the value of a memory location the new contents may have to be distributed among several nodes of the system who posses a replica of the data. This exchange of updates may involve the exchange of messages with one or more of the nodes in the system.
The predictability of the outcome of memory accesses is challenged by the complexity of distributed systems. A distributed system combines a num ber of individual and independent elements like computing nodes and network components th a t influence the tim ing of mes sages and the execution of operations. These factors complicate a predictable execution of a program in a distributed system.
Th(^ time a message takes to be delivered from the sender to the receiver in a distrib uted system depends on various factors. The sender employs a protocol stack th a t prepares and sends the message over the wire. A similar protocol stack is employed by the receiver. These protocol stacks handle incoming and outgoing messages and transform them into th e form at of underlying transp ort layers such as the physical layer. The execution of these protocols and the transfer of the messages of a phsical medium
A protocol defines tlie synchronization of d istrib u ted operations and the handling of messages. This sjjecification guarantees a predictability of the behaviour of memory accesses.
The main issue addressed by most protocols is the am ount of d ata and the frequency of network traffic caused by consistency m aintenance. Network traffic has proven to be the major bottleneck in distributed systems despite advances in the speed of tran sp o rt media and communication protocols. Consistency m aintenance protocols aim to reduce network traffic by coupling protocols close to the underlying com m unication protocol and by gathering messages into blocks of messages.
Recent shared memory systems feature consistency m aintenance protocols th a t facilitate distributed shared memory on various platforms and to adjust systems to various require ments. A number of protocols provide enhancem ents such as gathering of writes in order to reduce network traffic, increased concurrency to enhance the time spent processing, etc.
Research in the area of DSM systems has produced an abim dance of protocols - each with its individual strength and weaknesses. These protocols vary in the concurrency they alkw and in the network traffic they cause. However, very few systems provide the flexibility to adjust the consistency m aintenance to the needs of an application. Most systems focus on one as])ect of distributed shared memory and fimit themselves to a single solution th a t has to serve all applications.
As Munin[14] showed, not all applications exhibit the same access p attern s to shared data. Applications share d a ta in very specific ways. This can be exploited if the devel oper is allowed to provide knowledge about the d a ta and their sharing characteristic. D ata structures can hold for example initialization data. This d ata is distributed at the sta rt of an application and not updated afterwards. The knowledge of this access p attern allows to reduce synchronization efforts in relation to this data.
1.3
A d a p ta b ility
C h a p te r 1. Introdu ction
by individual nodes to shared data may perform poorly when presented with a problem th a t require individual concurrent accesses by individual nodes.
A number of system s[ll, 14, 79] have attem pted to adap t their behaviour to the usage pattern displayed by applications. These systems include a fixed set of protocols from which a suitable protocol is chosen according to the needs of the application. The protocol is chosen by a runtim e analysis of the sharing characteristics exhibited by an application and a decision process th a t is coded into the DSM system when it is build.
This movement towards adaptation follows a trend in the general software development towards increased customizability. During recent years a mnnber of techniques th a t aim towards customizability have found growing acceptance. O bject-oriented programming, for instance, has produced the idea of frameworks of components th a t implement a number of solutions for a problem. These solutions are implemented as components w ith defined characteristics and interfaces. The components are interchangeable with each other and allow the developer to adapt an existing solution by exchanging a component currently in use by a system against a more suitable component according to the problem at hand.
Aspect-oriented progranuning and refiection[61] take adaptability a step further by pro viding developers with mechanisms th a t allow the separation of functional and non-functional code. The fimctional code of a system imj)lements the core functionality which defines the system; the non-functional code implements supportive functionality like the persistence of data. The fmictional and non-functional code are combined by the reflective system to form an executable apj)lication.
The separation of concerns allows a developer to create functional code w ithout the need to determ ine possible changes in the runtim e environment. The application can be adapted to its execution environment by providing corresponding non-functional code. This ad aptation happens w ithout changes to the application or recompilation of the application code.
1.4
T h esis
in consistency maintenance protocols. We propose a new characterization for consistency maintenance protocols based on the division of these protocols into consistency models and coherency protocols. One of the contributions of this thesis is the precise definition of consis tency models and coherency protocols as com ponents of consistency m aintenance protocols. This definition includes a description of all characteristics of the components and their sub components. The definition of these components and their interfaces is then used to describe their relationship. This description is novel to our knowledge.
This division into components and a further structuring into subcom ponents allow us to adapt a consistency maintenance protocol to the sharing characteristics of a particular application. Every instantiation of a sub-component can be exchanged against another in stantiation of the same sub-component. This exchangeability aids the adaptation.
The definition of the components and their relationships is translated into a framework th a t is implemented using Java. This framework provides the developer with interfaces and classes for consistency models and coherency protocols. The interfaces define methods th a t coherency f)rotocols and consistency models must f)rovide. The enforcement of these interfaces m aintains the exchangeability of components of the same type in the framework. The classes j^rovided by the framework can be divided into concrete and ab stract classes. The concrete classes represent complete implementations of consistency models and coherency protocols. These implementations can be readily used in applications as building blocks for a consistency m aintenance protocol. The abstract classes of the framework represent base classes th a t can be extended to create new im plem entations of consistency model and coherency protocols in order to adjust a consistency m aintenance protocol to application- specific needs.
1.5
E valu ation
C h a p te r 1. In trodu ction
a selection from the spectrum of possible sharing characteristics. The flexibility th a t is offered by the framework is exploited to adjust the consistency m aintenance to suit the individual sharing characteristic. It is dem onstrated how consistency m aintenance protocols can be cre ated by combining individual components th a t are provided by the framework. This shows the general applicability of our approach. A further dem onstration of the framework shows how the flexibility provided can be used to create solutions th a t have not been possible before because of the lack of structure in consistency m aintenance protocols.
These im plem entations dem onstrate th a t the definition of the components of consistency maintenance protocols and the transfer of these definitions into a framework results in a high degree of flexibility. This level of flexibility surpasses the flexibility th a t is offered by traditional approaches to DSM systems.
1.6
R oad M ap
The remainder of this text is structured as follows; C hapter 2 gives an overview of the related work in the area of distributed shared memory. This chapter provides the fundam ental Imck- ground inform ation th a t is used in the subsequent chapter to develop a new characterization for DSM systems. C hapter 3 gives a detailed description of our new characterization of con sistency m aintenance protocols. This characterization is translated in chapter 4 into a DSM framework. C hapter 5 presents a set of applications th at have been implemented using the framework. The performance shown by these applications is used to evaluate the framework and to support our thesis. The thesis concludes with a chapter on the conclusions th a t can be made from our characterization and from the evaluation of our framework.
1.7
S u m m a ry
C hapter 2
R ela ted work
A iiuiiiber of ideas and concepts th a t are found in d istrib u te d shared m em ory system s can trace their origin in the area of jiarallel hardw are. In o rder to place th e description of our work in context we review the designs and concej)ts of i)arallel and d istrib u te d system s. T his review can also be understood as th e m otivation in our search for increased flexibility and the resulting definition of our ch aracterization of consistency m aintenance protocols.
'I’lie first section of this chapter will relate tre n d s in the general area of d istrib u te d system s area. These trends can be seen as th e background and m otivation for th e work presented in this thesis. T he chapter continues w ith a descrijjtion of relevant types of DSM system s and issues of consistency m aintenance. T his description is followed by a detailed ex p lan atio n of consist(uicy models. T he chapter is concluded w ith a section ab o u t DSM pro jects th a t share a close relationship w ith the work presented in this thesis.
2.1
M o tiv a tio n
aims.
Perform ance of software is generally in d icated by th e speed w ith w hich a problem is solved. One of the m ost influential factors th a t determ in e th e speed of a solution is the degree of p a rtic u la rity of the im plem entation for a specific problem . In order to solve a problem m ost efficiently a program has to be im plem ented as problem -specific as possible. An im plem entation has to take advantage of all possible factors th a t have an influence on the execution speed of th e application. T hese factors include problem -specific inform ation such as value co n strain ts and fram e conditions of form ulas, inform ation ab o u t th e hardw are th a t is used as im plem entation base such as hard w are im plem ented acceleration and inform ation ab o u t the language th a t is used to im plem ent th e solution such as th e handling of m atrices in Fortran.
All these factors create an enorm ous space of possible solutions w here each set of values represents an efhcient solution for a p a rtic u la r problem . T hese solutions are im plem ented as protocols for various im plem entations. Each of these protocols has a different em phasis and represents an efficient solution for a p a rtic u la r case. In o rder to build an efficient application a develop(!r has to choose a num ber of these i)rotocols and com bine them into an application. T his com bination of protocols im plem ented for one applicatio n can usually not be reused for other applications.
T his m ethod - of building extrem ely problem -specific solutions - conflicts w ith the quest for ever-faster software developm ent m echanism s. T he search for faster software developm ent m ethods is m otivated by in d u stry ’s desire to speed-up th e developm ent of software and to achieve a faster tim e to the m arket for th eir p roducts.
C h apter 2. R elated work
However, high-level co n stru cts in h ib it th e developm ent of highly specialized solutions. Every high-level co n stru ct is m ade up of a num ber of low-level operations. T hese o perations are th e sam e in every context w here th e high-level co n stru ct is used. T h u s an im plem entation of a high-level co n struct cannot take advantage of all perform ance enhancem ents th a t may be available.
A possible solution to th e conflict betw een th e aim of high-level solution an d the use of highly problem -specific code is ad ap tab ility . A d a p ta b ility is a technique th a t enables a system to em ploy high-level solutions while th e underlying code is highly problem -specific. An a d a p ta b le system contains a num ber of im plem entation of protocols w ith sim ilar functionality. A protocol is chosen to suit a given environm ent an d can be exchanged for a n o th e r protocol if th e ch aracteristics of an environm ent change.
For exam ple, a chat system m ay provide th e abilities to send a n d receive messages. Two possible environm ents for this system can be envisioned: closely-coupled sytem s such as clusters and a set of com puters connected by a w ide-area network. T h e delivery of messages in these environm ents differs in term s of reliability. In m ost closely-coupled system s, the delivery of netw ork traffic is very reliabk;. T h e system does not need to ensure th a t a packet - once it is send - arrives a t th e receiver in th is kind of envirom nent. However, if th e system is used in coim ection w ith a w ide-area netw ork, th e delivery of messages becom es m ore complex and th e reliability is often questionable. In this environm ent, th e com m unication system has to provide a d d itio n a l functionality th a t ensiu'es the delivery of messages.
T he system m ay have two im plem entations for message delivery th a t are each suitable for the corresponding environm ent. A su itab le im plem entation is chosen according to the environm ent.
2.2
A d a p ta b ility in D S M s y ste m s
• System a d a p ta b ility • P rotocol a d a p ta b ility • O bject-based a d a p ta b ility • R untim e a d a p ta b ility • D ynam ic ad a p ta b ility
2.2.1
S ystem ad ap tab ility
System ad ap tab ility is ad opted by DSM system s th a t em ploy system -w ide a single protocol to ensure th e consistency of d a ta . A protocol is chosen before th e deploym ent of th e system , i.e., a t compile-tim e. T h is protocol stays fixed du rin g th e lifetim e of a system , i.e., im til a recom pilation of the system .
System s th a t provide system a d a p ta b ility allow protocols to be changed in order to a d a p t to relatively infrequent changes in the environm ent, e.g., th e a d a p ta tio n to a new hardw are platform . These infrequent changes m ay require m ore efibrt from th e m ain tain er of a system and a reconfiguration a n d recom pilation of th e system is necessary to achieve a change.
I 'h is type of a d a p ta b ility is generally facilitated by th e definition of interfaces. An in terface specifies the access to the im plem entations of protocols. It defines the m eans th a t a system uses to access a protocol and hides th e in tern al functionality of an im plem entation.
A protocol for a system th a t provides system a d a p ta b ility needs to be developed against such an interface. An interface has to accom m odate all com m unication betw een a protocol and a system . A change in the interface requires changes in all protocols th a t are to be used w ith th e system . A new protocol th a t is installed in a system replaces an existing protocol. No two protocols are su p p o rte d by th e system a t any tim e.
Chapter 2. R elated work
System a d a p ta b ility is generally provided by DSM system s th a t are im plem ented as p a rt of an operating system [38, 37, 62], It has its equivalent in th e w hite box approach [60] th a t is employed in the general com puter science com m unity.
2 .2 .2
P r o to c o l a d a p ta b ility
Protocol a d a p ta b ility represents th e runtim e-equivalent to system -adaptability. It allows a user to choose a system -w ide protocol prior to the execution of an appU cation.
T his m echanism requires th e im plem entation of a nu m b er of protocols in th e system prior to its deploym ent. T h e num ber of protocols rem ains fixed d u rin g th e lifetim e of a system . A system has to be re-deployed if a protocol is to be added to an existing set of protocols. T his redeploym ent is th e equivalent of an in tro d u c tio n of a new protocol into a system th a t provides system adaptability.
A user chooses a t compile- or ru n -tim e of an a p p licatio n a protocol th a t is suited to a p articu lar problem a t hand. T his protocol is then used du rin g th e ru n tim e of the system to ensure the consistency of all shared d ata.
This m echanism usually introduces an interm ed iate layer th ro u g h which all protocols are accessible. T his interm ediate layer rej)resents an a d d itio n a l indirection du rin g th e execution of tlie system . C om pared to th e in tro d u c tio n of an interface in system adaptabiU ty this m echanism needs ad d itio n al inform ation in order to provide th e fiexibihty betw een different protocols.
T he choice of th e protocol is usually im plem ented as sw itch or configuration o p tion [93]. Different instances of the system th a t are executed a t the sam e tim e can use different p roto cols.
Protocol ad a p ta b ility is provided by system s th a t are im plem ented as stan d alo n e system s or libraries such as CVM [55, 56].
2 .2 .3
O b jec t-b a sed a d a p ta b ility
system s or D istributed Shared O bject (DSO) system s allow a developer to define the granu larity of sharing.
Johnson et al. developed w ith th e C Regions Library ( C R L ) [50, 51] an all-software DSM system th a t is im plem ented as a lib rary for C program s. T h e system allows th e developer to specify any kind of C stru c tu re th a t is to be shared. T h e s tru c tu re s are kept consistent by a single protocol th a t reacts on aim o tatio n s given by th e program m er.
B ershad et al. im plem ented a sim ilar approach w ith Midway [19, 95]. M idway is im plem ented as a com piler and a ru n tim e environm ent. O b jects in th is system are associated w ith a synchronization object. Accesses to objects trigger a m echanism th a t invokes the syn chronization object and th ro u g h th is th e underlying protocol th a t provides th e consistency m aintenance.
Both CRL and M idway are object-based system s th a t provide a single protocol th a t en forces the consistency am ong objects in th e system . T hese system s are object-based b u t do not provide object-based adaptability. O bject-based a d a p ta b ility defines th e specifica tion of protocols on a j)er-object basis. T his m eans th a t system s th a t provide object-based a d ap tab ility inipkinicnt a num ber of protocols th a t can be assigned on a per-object basis. T his m(;chanism allows the a d a p ta tio n of a DSO system to th e needs of an application in a fine-grained m anner.
A num ber of system s [74, 75, 53] require a n n o ta tio n s to th e program code by developers. These an notations determ ine th e protocols th a t are to be used. O rca [11] avoids an n o tatio n s and achieves o b ject-ad ap tab ility through th e analysis of source code at compile tim e. T he analysis determ ines th e sharing p a tte rn s of individual o b jects a n d advises an underlying runtim e system on the choice of correct protocols.
System s th a t provide object-based ad a p ta b ility require th e im plem entation of protocols prior to the deploym ent of a system . T h e num ber of these protocols rem ains fixed during the lifetime of the system.
Chapter 2. Related work
b u t has the advantage th a t a developer can develop a class lib ra ry and specify a protocol for a given class; users of this library th a n do not have to be concerned w ith th e choice of a protocol when they in sta n tia te an object.
2.2.4
R u n t im e a d a p ta b ility
R untim e a d ap tab ility describes th e ability to a d a p t a system a t ru n tim e to th e requirem ents exhibited by an application. T his kind of a d a p ta b ility is achieved by analyzing th e ru n tim e behaviour of an application and by changing the underlying protocol according to the results of this analysis.
Tn contrast to system s w ith object-based ad ap tab ility , ru n tim e a d a p ta b ility does not require aim otations from th e developer. However, the analysis of ru n tim e behaviour has a perform ance cost. T h is cost has to be outweighed by th e gain th a t th e change from one protocol to another can produce.
2 .2.5
D y n a m ic a d a p ta b ility
Dynam ic a d ap tab ility rejiresents a com bination of o b ject-based a d a p ta b ility and ru n tim e adaptability. T he assignm ent of a protocol to a p a rticu la r o b ject can be changed du rin g ru n tim e. T his change can be determ ined either by a ru n tim e analysis of th e sharing characteristic of an object or by an n o ta tio n s given by a developer.
O rca provides a ru n tim e environm ent th a t m onitors th e access characteristics of in d iv id ual objects and decides on th e basis of heuristics if a change from one protocol to a n o th e r would bring a benefit. A dditionally, O rca provides a p rogram m ing language th a t allows an notations to shared objects. W ith this m echanism it allows th e developer to help th e ru n tim e environm ent to find an ap p ro p riate protocol.
2.2.6
Sum m ary o f A d a p ta b ility C ategories
Very few DSM systems achieve a high degree of adaptation. This is due to a num ber of issues. Research im plementations of DSM systems are built to prove a very specific point. Very few systems are intended to prove th a t adaptability is an issue and th a t a certain adaptation mechanism provides an improvement over existing systems. A nother issue is th a t hardw are implementations prevent the introduction of new algorithm s. Hardware-based im plem enta tions of DSM system have a fixed number of protocols implemented. These protocols can not be modified or extended because of the nature of hardware.
In the following section we will describe a range of types of DSM systems. In the progress of this section it will become clear th a t the development of the architecture of DSM systems moved from hardware-based DSM systems towards software-based DSM systems th a t provide increasingly adaptability.
2.3
T y p es o f D S M sy ste m s
The general model of a distributed system (as sliown in figure 2.1) is made up of a num ber of proc(!Ssor nodes th a t are linked by an interconnection.
An interconnection supports the communication of processor nodes with each other and can have a variety of forms. One extreme form of an interconnection is a bus-based architec ture where all nodes share a common medium. All nodes exchange messages over this medium anfi are able to listen to all messages the medium transports. This arrangem ent has two sig nificant characteristics: 1) every coimnunication affects all nodes and 2) communication of one node with a group of nodes is very cheap. The other extreme form of interconnection is a crossswitch where each node is linked with every other node with a direct, individual connection. This form allows two nodes to communicate w ithout interfering with the com- nmnication of other nodes; however, the communication of one node to a num ber of other nodes is more expensive th an in the former case.
mes-C hapter 2. Related work
sages. An example of this is the arrangem ent of of one of the extreme forms in a hierarchical order.
Application
Shared memory
Node I Node 2 Node 3 Node 4 Node n
Network
F ig . 2.1 : General model of shared memory
A shared memory system introduces a shared state into this model. The shared state is ini[)lcniented by a shared main memory. The connm mication between processor nodes and the main memory is transported over the interconnection. In the simplest case a main memory has one connection to the interconnection. The characteristic of the bottleneck increases if either the interconnection or the connection to the main memory supports only a conunimication with one processor at a time
Caches are introduced into this scenario to limit communication over the intercoiniection. They are co-located with processor nodes and hold copies of d ata from a location th a t has been accessed by a processor node before. In the simplest case, when a processor reads the same location again the cache can provide the d ata w ithout communicating w ith the main memory. A write operation to the same location should cause the cache and the main memory to be updated.
[image:34.518.46.480.97.394.2]different orders.
In his paper on sequential consistency [63] Lam port identifies 3 possible scenarios th a t may lead a processor to read incorrect data:
• A value is w ritten to one memory location and another memory location is read sub sequently. The read operation may be served before the write operation has been propagated throughout the system. This may lead to another processor reading the old value of the first memory location. This scenario may cause problems if the read operations are used to determ ine the entry to a critical section.
• A processor writes a value into its cache where it is stored b ut not immediately propa gated to the main memory. A nother processor th a t accesses the same memory location is provided with a different value.
• A processor writes a value into its cache and the value is propagated to the main memory. Another processor reads the value of the same memory location th a t has been stored in its cache.
This anomaly can be seen as the source for the development of distributed shared memory architectures. The research in DSM systems has produced a number of different systems with a variety of individual protocols and target areas. Each of these systems implements the abstraction of caches in a different way. The systems can be classified into general categories according to their overall im plem entation as follows:
• Hardware-assisted shared memory
• Page-based software systems
• Object-based systems
2.3.1
Hardvi^are-assisted shared m em ory
Chapter 2. Related work
A bus connects a number of nodes with a single com munication medium. All nodes share this medium. A message th a t is created by one node is propagated along the bus to all other nodes. This allows all nodes to hsten to all communication on the bus.
Node Node Node
Cache
Cache mgmt Processsor
Cache mgmt Cache Processsor
C ache mgmt Cache Processsor
Main memory
F ig . 2.2: Schematic of a bus-based DSM system
Figure 2.2 depicts a typical model of a bus-l>ased shared memory system. The system consists of a immber of processors nodes and a main memory. The processor nodes contain a processor with a co-located cache and a cache management unit. The cache management miits and the main memory connnimicate using the bus.
A read memory access by one of the processor is handled by the cache. If the cache cannot satisfy the access the cache management attem p ts to retrieve the d ata from the main memory or from one of the other caches. A number of protocols have been put forward to handle the exchange of d ata between caches and main memory. These protocols are called snooping protocols because of their characteristic to listen to communication on the bus at all times. Common snooping protocols include the Illinois- [33], Berkley- [32], Firefly- [92] and Dragon-snooping protocols [9].
[image:36.518.50.485.140.408.2]Node 6
Node Node 5
Node?
Node 1 Node 2 Node 3 Node 4
F ig . 2.3 : 2-dim ensional m esh of a crossbar
Shared mem ory system s w ith crossbar arch itectu re place a num ber of nodes on one side of the mesh and a num ber of m em ory m odules on th e o th er side. T his way, in d ividual m em ory m odules can be accessed w ithout interfering m em ory accesses to o th er m odules. T his enhances concurrency. In troducing caches into this p ictu re is ugly because unlike on bus- based system s nodes cannot m onitor accessfis by o th er nodes and thus have to connnunicate w ith th(! mem ory m odules th a t th en m ight have to invalidate copies of d a ta kept by o th e r processors.
B oth, bus- and crossbar-technologies, exhibit lim ited scalability. Buses lim it th e m u n b er of nodes th a t can be connected to them because all nodes share th e sam e m edium a n d only one node can send over this m edium a t any tim e. O th er nodes th a t w ant to send som ething have to wait until a sender has finished and th en com pete for the right to be th e next sender. W ith an increasing num ber of nodes, the risk of com m unication conflicts grows an d the delay of com m unication over th e bus increases. C rossbar arch itectu res are lim ited in th e ir scalability because they require for each node th a t is added a connection to every o th er node. T he nm iiber of connections grows thus exponentially w ith the num ber of nodes a n d m ake this requirem ent only m anageable for a sm all num ber of nodes.
[image:37.518.40.497.59.331.2]Chapter 2. Related work
meiuoiy systems.
The D ASH system [67, 66, 65, 68, 44] combined bus-based shared memory systems with a general-purpose mesh. The bus-based shared memory systems are SGI Power Station 4D/340. These systems contain 4 processors w ith co-located caches and a shared main- memory. The consistency of the main-memory and the caches is m aintained by the Illinois- or MESI (modified, exclusive, shared, invalid)-snooping protocol.
A memory access is handled by a local cache first. If the local cache cannot satisfy the memory access all caches register the access. A w rite access to a particular cache line causes an invalidation of cache hues for the same address on other processor caches. On a read of such an invalidated cache line a processor requests an update from the shared memory. DASH combined a num ber of these hardware-based shared memory systems to a cluster by connecting them with a 2D mesh. The connection among machines is provided by a directory controller (DC). Each node in the mesh holds a predefined address range. The DC keeps track of all memory accesses and resolves all memory accesses to address ranges th at are not m aintained by the local machine. A memory access to a remote memory address involves conununic.ation over the mesh to the node th a t acts as home for the address range.
The interesting aspect of the DASH system is the combination of bus-based shared mem ory systems and a general-purpose network mesh. This combination allows the DASH systems to avoid the lim itation of bus-based systems and creates a 3-tier structure for memory ac cesses. This 3-tier stru ctu re introduces new possibilities for performance gains by carefully locating the d ata on nodes th a t access these d ata most frequently. D ata placement has been an issue of research before but DASH introduced another dimension into the known scenario by having 2 platform s in which the placement makes a difference. By placing the d a ta on a 4-processor node th a t makes most use of these d ata another performance gain is possible.
Processor Processor
Processor Processor
I Local C ache Local Cache
i^ms
Local Cache DirectoryLocal Cache f. ; I Local Cache ; ;
Local Cache Directory Local Cache Directory Local Cache Directory ALLCACHE Engine : 0
ALLCACHE R outer Directory Ring 0 ALLCACHE Router D ire cto ^ ALLCACHE Group: 0 Directory ALLCACHE Group: 0 Directory ALLC A C H E Engine : I Ring 1
F ig . 2.4: KSR-1 A rch itectu re
is 128 bytes. T he processor was clocked a t 20MHz a n d each node had a peak perform ance of 20 M IPS.
Nodes in the KSR-1 are connected in u n id irectional rings w ith a capacity of 1 G iga byte/sec. These; rings are called ring:0 or leaf rings. E ach ring can hold 32 nodes a n d can be connected to other rings by a second-level ring and th u s form a hierarchical s tru c tu re of rings. Second-level rings are called rin g :l and have a capacity of 1-4 G igabyte/sec. A configiuation of a KSR-1 could range from 8 to 1088 nod<!S and have a peak perform ance of 21,750 M IPS.
T he nieniory of th e KSR-1 is m ade up from the second-level caches of all nodes. All d a ta in th e systfun is held in th e caches of individual nodes an d no node possesses a m ain memory. T his type of m em ory system is called a cache-only m em ory a rch itectu re (COM A) [89, 46, 84]. T h e consistency of all caches in the system is m aintained by the A LLCA CH E m em ory system . T he ALLCA CHE system ensures th a t all accesses to d a ta occur according to sequential consistency.
T he A LLCA CHE system keeps track of cache lines by m aintaining directories of the location of cache lines for each ring. Ring controllers th a t connect a ring:0 to a ring:l m ain tain a directory for all pages in a ring. A m em ory access th a t cam iot be satisfied by a local cache causes a search for th e cache line in the local ring. If the access cannot b e satisfied w ithin a ring, a search th ro u g h all group directories in a rin g :l is invoked.
[image:39.518.37.483.71.438.2]Chapter 2. Related work
of a ring structure. The scalabihty of a ring stru ctu re is similar to th a t of a bus: The round- trip time for a message increases with the immber of nodes; the addition of too many nodes impedes the performance of the system dramatically. The KSR-1 avoids this scalability issue by applying a m ulti-tier stru ctu re in form of a hierarchical organization. This approach gives the KSR-1 similar characteristics to the DASH system in term s of d ata placement: The time of accesses to d ata increases with the num ber of tiers th a t have to be passed. However the organization of nodes into rings differs from bus-based systems in two ways: All messages have to pass all nodes between the sender and the recipient; but compared to bus-based systems not all nodes see a message on the ring. This means th a t snooping protocols are more complex to implement.
SC I [49] represents a more recent im ijlementation of hardware-based shared memory sys tem. SCI provides an interconnection th a t is connected to the memory of a node through the Direct Memory A rchitecture (DMA). The user can share memory between individual nodes by mapping memory from a remote node into the virtual address space of local processes. In the event of a memory access to a remote address an SCI interface identifies the nature of a remote meniory access, determines tlie node th a t needs to be contacted and realizes the memory transfer.
To summarize this section: Most hardware-based systems employ a single protocol. This protocol cannot be changed during the lifetime of the system. The granularity of sharing is fixed to the size of caclie lines and to page sizes. The development of hardware-based DSM systems is mostly concerned with scalability issues.
2.3.2
P age-b ased softw^are system s
loading the memory contents th a t is expected a t the virtual address into physical memory and mapping the virtual address to the physical address where the d ata has been stored.
The memory address space is divided into memory pages for adm inistrative purposes. A system fills up available physical memory pages until a threshold is reached. A number of memory pages th at have been filled may not be accessed very often. The contents of these pages can be moved to a hard disk and their space m ade available for other pages. The swapped-out pages are brought into the physical memory only when they are accessed again.
Many hardware platforms provide a memory m anagement unit (MMU) to support virtual memory. This MMU handles the translation from virtual to physical addresses. A memory access triggers a function in the MMU th a t verifies if the memory page referenced by a virtual address is in physical memory. If the page is not in physical memory a function in the operating system is executed to allow the operating system to transfer memory contents from the hard drive into physical memory.
A number of DSM systems use virtual memory mechanisms to regulate the access to shared data. The notification of accesses to invalid memory pages can be used to handle ac.cesses to pages that are hekl on remote nodes. Instead of causing the system to swaj) in a memory page from disk the system acquires a valid copy of the page from a rem ote note.
Systems th at use this mechanisms can be implemented in a variety of ways as shown in [72, 19, 82]. The implementation methods can be classified into two main categories: kernel-space and user-space implementations. Kernel-space im plem entations are more efficient because the access to adm inistrative structures for virtual memory in the kernel does not require context switches. However, kernel-space im plem entations have the disadvantage th a t there is one implementation of the system for the whole operating system. Thus, a kernel-space im plem entation imposes a single protocol on all applications.
Cha])ter 2. R elated work
Tlie Shared M emory Server implemented by Forin [38] et al for the Mach micro-kernel [24, 25] attem pts to combine the advantages of kernel-space im plem entation and user-level control. Mach exports the interface for virtual memory managers to the user-level and lets a user define individual virtual memory managers. This allows a user to install a memory manager th at is suitable for the particular sharing characteristics of an application.
The TreadMarks [57, 58, 6] system represents an im plem entation of a DSM system as a user-level library. The library is based on Unix System V system calls th a t provide a mechanism to control the access rights to portions of memory. The niprotect() system call provides a mechanism to set the protection bits of virtual pages. This mechanism allows the interception of accesses to invalid pages and the execution of functions th a t retrieve valid copies of these pages.
The im plem entation of TreadMarks as user-level library allows users to execute applica tions independently from each other. A program using TreadM arks is started at one node. This program determines a list of available machines and the execution environm ent of Tread Marks spawns a copy of the program on each of these machines. This mechanism resembles the /^;r^'-niechanism of Unix environments with the difference th a t each copy of the program is executed on a different node. The program use functions provided by TreadM arks to allocate memory on the local machine and to declare it shared. The imderlying execution environment of TieadM arks distributes the inform ation about the shared memory sections to the other programs. The consistency of the shared memory is m aintained by exploiting the interception of page fault. A page fault causes an exectition of a function in the TreadM arks execution environment. This function ensures the consistency of the inform ation th a t was reciuested when the page fault occurred.
fine grained in these system s. Softw are-based system s, however, have the problem th a t m on itoring accesses is expensive because w ith every access op eratio n a d d itio n al op eratio n s have to be executed to ensure th e validity of th e access. T his makes consistency m odels th a t do not rely on m onitoring individual accesses like relaxed consistency m odels m ore a ttra c tiv e to softw are-based system s. These consistency m odels, however, depend on inform ation ab o u t th e beginning and end of critical sections.
A nother critical issue for page-based system s is th e fixed large g ran u larity of page sizes. T his fixed gran u larity introduces two m ain issues: false sharing and heterogeneity problem s.
False sharing[22, 34] occurs w hen two indep en d en t d a ta item s are placed w ithin th e sam e page. A w rite op eratio n to one d a ta item causes generally all o th er copies of th is page to be invalidated. In the case of two d a ta item s accessed by two nodes concurrently over some tim e th is can lead to a “ping-pong” effect in w hich th e page is tran sferred back and forth betw een th e two processors involved. T readM arks solves th e problem s by applying a m ethod developed first in M uiiin [14]: Each j)rocessor copies a page before an u p d a te is allowed. T his “tw in ” of the original is kept u n til the critical section of accesses is finished. T h en the tw in and m odified page are com pared and tlie m odifications or “diffs” are exchanged betw een interested nodes.
T he fixed g ran u larity of m em ory pages also causes problem s when different arch itectures are com bined in a heterogeneous envirom nent. An a rch itectu re can im plem ent e ith e r little or big endian byte-order and can im plem ent any size for a m em ory page. T h e com bination of different arch itectu res in a heterogeneous environm ent makes an exchange of m em ory page a n on-trivial issue if th e byte-order and page size is different am ong th e arch itectures involved.
Chapter 2. Related work
Ivy [71] is one of the first DSM systems based on a virtual memory mechanism and rep resents a typical and rather simplified approach to this kind of DSM system implementation. The introduction of m ultiple copies of shared d a ta items and the change of the node th a t hold the most up-to-date copy resulted in a discussion of ownership. Li and Hudak [73] discuss a number of possible ownership m anagement schemes. These schemes determ ine the node th a t manages the access to a page. Ownership management schemes can be coarsely categorized as fixed ownership m anagem ent and dynam ic ownership management.
Fixed ownership management determ ines the ownership of a page and the ownership of the page stays fixed with a node during the runtim e of the system. All write accesses to a given page have to be made through the owner; nodes th at are not owner of a page never get direct w rite access to this page. This mechanism simplifies the algorithm th a t is needed to keep the page consistent i.e. it is easy to linearize the access to a page. Everything is handled by one node thus all write accesses can be easily put in sequential order by this node. However, this simplified mechanism introduces a bottleneck and reduces the concurrency of write accesses. Because all writes have to go through the owner write can become blocked there.
Dynamic ownership management determines the ownership of a page during the runtime of a system. This allows the system to place a page where it is accessed most frequently. The mechanism introduces the problem of finding the owner of a page. These algorithm s can be subdivided into two categories: Centralized and distributed ownership management.
Centralized dynamic ownership management assigns a management node to every page. The node keeps track of where the page has moved and as central inform ation hub for the page. The node can determ ine access rights of the nodes th a t content for a page and can linearize the w rite accesses.
D istributed ownership m anagem ents can be classified into dynam ic management and broadcast management.
a b o u t an owner of a page tu rn s out to be false, it provides a t least a “h in t” w here to find the owner of th a t page. Li and H udak show th a t, in th e w orst case, th e num ber of messages to locate a page depends on the num ber of tim es a page was searched for an d on th e num ber of processors contending for th e page. T h e w orst case according to Li a n d H udak is 0 ( p + K log p) w here p is th e num ber of processors and K is th e num ber of tim es th e page was searched.
T h e use of b ro adcast in d istrib u te d ow nership m anagem ent is sim ilar to th e use of buses in hardw are-based shared m em ory system s. A faulting processing node b ro ad casts a request to find th e present owner of a page. T ann en b au m et al [91] show w ith O rca th a t broadcasts in certain system s can be inexpensive w hen su p p o rte d efficiently by hardw are. However, b ro ad cast is very expensive if it is not su p p o rt by th e underlying netw ork in fra stru ctu re .
Mirage [36, 35] introduced a tim ed ow nership for pages. A system provides every node w ith a tim e sim ilar to tra d itio n al C P U tim e slice in m ulti-process environm ents. Every node can hold a page for a given tim e. If a request for this page is received du rin g this tim e the tim e-slice is com pleted and th e ow nership of the page is th e n tran sferred to th e requesting node.
T h e M irage system is im plem ented as an extension of th e Locus o p e ra tin g system on VAX ll/7 5 0 s . T h e Locus system is a System V com pliant Unix im plem entation. T h e page size in this system is 512 bytes. Every page in the system is p a rt of a larger segm ent. Segm ents can be a tta c h e d into the address space of a process.
Single-address space system s [94, 87, 47, 27, 29] can be based on a v irtu a l m em ory ab stra c tio n sim ilarly to page-based system s. However, page-based system s provide each process on a node w ith its own address space. A page on a node m ay be accessed by two program s. T h e m em ory m anager on th is node has to m ain tain the s ta te of the page for b o th program s and resolve any conflicts betw een these program s.
Chapter 2. R elated work
2.3.3
O bject-based sy stem s
A th ird class of DSM sy stem s in c lu d e s o b je c t-b a s e d sy stem s. T h e se sy ste m s use th e a b s tra c tio n s of o b je c t-o rie n te d lan g u ag e s a s a basis.
O ne of th e im p o rta n t a sp e c ts o f o b je c t-o rie n ta tio n in th is c o n te x t is th e e n c a p su la tio n o f d a t a in o b jects. T h e d a t a ite m s t h a t m ake u p a n o b je c t ca n o n ly b e accessed th ro u g h m e th o d s th a t are p ro v id ed by th e o b je c t. T h is allow s th e p ro g ra m m e r to c o n tro l th e access to th ese d a ta item s a n d to in tro d u c e in s tru c tio n for th e c o n siste n c y m a in te n a n c e if needed. T h u s an o b je c t-o rie n te d D SM sy ste m ca n b e im p le m e n te d w ith o u t re ly in g o n s u p p o r t from th e u n d e rly in g o p e