Lecture Notes in
Computer Science
Edited by G. Goos and J. Hartmanis
105
D.W. Davies E. Holler E.D.Jensen S. R. Kimbleton
B. W. Lampson G. LeLann K. J. Thurber R. W. Watson
Distributed Systems
Architecture
and Implementation
An Advanced Course
FBMathematikTUD 58312924Edited by B. W. Lampson, M. Paul, and H. J. Siegert
Fachbereich MathematlK
Techniscbe Koc.occhu.'e DarmstadtBib'iothek
.nv.-Nr.g 48 SO4
Springer-Verlag
Contents
1. Motivations, objectives and characterization of distributed systems 1
Gerard LeLann, INRIA Projet Pilote Sirius
1.1. Motivations 1
1.1.1. Technological changes 1 1.1.1.1. Microelectronics technology 1 1.1.1.2. Interconnection and communication technology 1 ° 1.1.2. User needs 2
1.2. Objectives 3
1.2.1. Increased performance .4 1.2.2. Extensibility .4 1.2.3. Increased availability 5 1.2.4. Resource sharing ... .5 1.2.5. Comments 51.3. Characterization 6
1.3.1. What is distributed? 6 1.3.2. What is new? •. 82. Distributed system architecture model :...'. 10
Richard W. Watson, University of California Lawrence Livermore Laboratory
12.1. Introduction . 10
2.2. Layers and interfaces 12
2.3. . Abstract objects as a unifying concept 15
2.4. The model layers 17
2.4.1. Introduction 17 2.4.2. Need for a distributed operating system 18 2.4.3. Application layer 19 2.4.4. Distributed operating system service layer 21 2.4.5. The interprocess communication layer 26 2.4.6. Hardware/firmware Components 31
2.5. Issues common to all layers 32
2.5.1. Introduction 32 — 2.5.2. Identifiers (naming) 32 2.5.3. Error control ; 33 2.5.4. Resource management 35 2.5.5. Synchronization ; 37 2.5.6. Protection .'.. 37 2.5.7. Object representation, encoding, translation 39 2.5.8. Testing, debugging, and measurement ..42
2.6. Global implementation and optimization issues 42
2.7. Conclusions 43
VI CONTENTS
3. Interprocess communication layer: Introduction 44
Kenneth J. Thurber, Sperry Univac
3.1. Introduction 44 3.2. Transmission medium 44 3.3. Hardware paths 45 3.4. l i n k s 46 3.5. Intervenors 47 3.6. Protocols 47 3.7. Protocol properties , : 50 3.8. Interconnection structure ..-. 51 3.9. Multiplexing 54 3.10. Arbitration 54 3.11. Computer networks versus distributed computers 54 3.12. Summary 56
^4. Hardware interconnection technology 57
Kenneth J. Thurber, Sperry Univac
4.1. Introduction 57 4.2. Topologies 57 4.3. Point-to-point 57 4.4. Multi-point 57 4.5. Taxonomies 59 4.6. Distributed system interfaces 63 4.7. Path allocation 66 4.8. Bandwidth/throughput tradeoffs 79 4.9. More on protocols 79 4.10. Buffers 1 83 4.11. Case studies 83 4.12. More on networks versus distributed computers 83 4.13. Summary 83
5. Link level 86 Gerard LeLann, INRIA Projet Pilote Sirius
5.1 Introduction 86 5.2 HDLC 87 5.2.1 Frame structure JS7 5.2.2 HDLC elements of procedure 88 5.3 The Arpanet IMP-IMP protocol 90 5.4 The Cyclades MV8 protocol 92
6. Hierarchy 94
Donald W. Davies, Computing Technology Unit, National Physical Laboratory (6.1-6.5) Richard W. Watson, University of California Lawrence Livermore Laboratoryl (6.6-6.8)
6.1. Introduction 94 6.1.1. The problems of a hierarchy 97 6.2. Arpanet as an example 102 6.3. Addressing, routing and congestion in large mesh networks 104
CONTENTS VU 6.4. Topology optimization 107 6.5. Packet versus circuit switching 108 6.6. Datagrams and virtual circuits 109 6.6.1. Datagrams 110 6.6.2. Virtual circuits 112 6.6.3. Datagrams vs virtual circuits 115 6.7. Network interfaces '. 118 6.7.1. Introduction 118 6.7.2. The pseudo device interface strategy 119 6.7.3. Importance of symmetry 120 6.7.4. Need for error checking at all levels 120 6.7.5. Flow and congestion control 120 6.7.6. Full duplex interface 121 6.7.7. Datagram versus virtual circuit interfaces 121 6.7.8. Xerox PUP as an example datagram interface and service 121 6.7.9. X.25 as an example vc interface 121 6.7.10. Implications of X.25 for distributed systems 124 6.7.11. Network frontends 127 6.8. Distributed systems and internetwork design issues 129 6.8.1. Introduction 129 6.8.2. Levels of network interconnection 129 6.8.3. Conclusions J.38
7. IPC interface and end-to-end protocols 140
Richard W. Watson, University of California Lawrence Livermore Laboratory17.1. Introduction 140 7.2. IPC service 141 7.2.1. Desired IPC characteristics 141 7.2.2. The IPC interface ..<. 141 7.3. Example IPC service model '. 145 7.4. Underlying IPC environment 147 7.5. Services required by an EEP of the next lower level 149 7.6. Levels of end-to-end services 149 7.7. • Origin, destination identifiers 151 7.8. EEP data objects and data stream synchronization marks 152 7.9. Error control and EEP state synchronization 153 7.9.1. Introduction 153 7.9.2. Error types and implications 154 7.9.3. Need for end-to-end error assurance 156 7.9.4. Error control mechanisms used while a connection exists 156 7.9.5. Connection management 161 7.9.6. Comparison of three-way-handshake and timer approaches 166 7.9.7. Bounding maximum-packet-lifetime 167 7.9.8. Reliable control information 168 7.10. Protection 169 7.11. Resource management •. 169 7.11.1. Introduction 169 7.11.2. Identifier space 169 7.11.3. Segmentation and reassembly : 169 7.11.4. Flow control 171 7.11.5. Priority .173
Vlll CONTENTS
7.12. Measurement and diagnostics 173
7.13. Conclusions 174
8. Distributed control 175
E. Douglas Jensen, Computer Science Department, Carnegie-Mellon University
8.1. Abstract 175
8.2. Introduction 175
8.3. The control space 178
8.4. Communication and the decentralization of control 187
8.5. Acknowledgement 190
9. Identifiers (naming) in distributed systems 191
Richard W. Watson, University of California Lawrence Livermore Laboratory
l9.1. Introduction 191
9.2. Identifier goals and implications 195
^ 9.3. Unique machine-oriented identifiers 197
9.4. Human-oriented names 203
9.5. Addresses and routing 206
9.6. Conclusion 210
10. Protection 211
Donald W. Davies, Computing Technology Unit, National Physical Laboratory
10.1. Basic protection needs 211
10.1.1. Protection in distributed systems 212
10.2. Single key cryptography and the Data Encryption Standard 214
-10.2.1. Measuring the strength of a cipher system 215 10.2.2. The data encryption standard 216 10.2.3. Block and stream ciphers...; 218 10.2.4. Block chaining : .219
10.3. Application of a cipher system at different levels in the hierarchy 220
10.3.1. Key distribution .222
10.4. Public key cipher systems 224
10.4.1. The discrete exponential function 226 10.4.2. The power function and its use in cryptography 228 10.4.3. The public key cipher of Rivest, Shamir and Adleman 228 10.4.4. The need for a digital signature 230 10.4.5. The registry of public keys .-.233 10.4.6. Other public key ciphers and signatures 234
10.5. Access control :. 235
10.5.1. The access matrix .235 10.5.2. The access control list ; ..236 10.5.3. Capabilities 211 10.5.4. Access control lists combined with capabilities 237 10.5.5. A simplified model for changing access rights 237 10.5.6. Capabilities in a distributed system 24Q 10.5.7. The location of access control in a distributed system 242
CONTENTS IX
11. Atomic transactions 246
Butler W. Lampson, Xerox Palo Alto Research Center
211.1. Introduction 246
11.2. System overview 247
11.3. Consistency and transactions 248
11.4. The physical system 249
11.4.1. Disk storage 250 11.4.2. Processors and crashes 251 . 11.4.3. Communication 252 11.4.4. Simple, compound and restartable actions 253
11.5. The stable system 254
11.5.1. Stable storage 254 11.5.2. Stable processors 256 11.5.3. Remote procedures 257
11.6. Stable sets and compound actions 257
11.6.1. Stable sets .258 11.6.2. Compound atomic actions .-. 259
° 11.7. Transactions 260
11.8. Refinements 263
11.8.1. File representation 263 11.8.2. Ordering of actions 264 11.8.3. Aborts 26412. Synchronization., 266
Gerard LeLann, INRIA Projet Pilote Sirius
12.1. Introduction 266
12.2. Consistency and atomicity 266
12.3. Event ordering and atomicity .:. 267
12.3.1. Partial and total orderings 267 12.3.2. Atomic operations 268
12.4. Synchronization 268
12.5. Synchronization and types of computing systems 270
.12.5.1. Fully replicated computing 270 12.5.2. Strictly partitioned computing 270 12.5.3. Partitioned and partially replicated computing 271
12.6. Event ordering—examples 271
12.6.1. Link protocols 272 12.6.2. Executives 272 12.6.3. Database system nucleus 272
12.7. Synchronization mechanisms for distributed systems 273
12.7.1. Centralized versus decentralized synchronization 273 12.7.2. Centralized mechanisms 274 12.7.3. Decentralized mechanisms 276
12.8. Evaluation criteria 282
13. Multiple copy update 284
Elmar Holler, Instiiutfu'r Datenverarbeitung, Kernforschungszentrum Karlsruhe
13.1. Introduction ..; 284
13.2. Basic architecture of multiple copy update mechanisms 286
13.3. Solutions to the multiple copy update problem 289
X CONTENTS
13.3.1. Voting solutions 291 13.3.2. Non voting solutions : 300
D.4. Verification of solutions 302
13.5. Evaluation of solutions 303
14. Applications and protocols 308
Stephen R. Kimbleton and Pearl Wang, National Bureau of Standards 3 (14.1-14.6)
Butler W. Lampson, Xerox Palo Alto Research Center (14.7-14.9)
14.1. Introduction 308 14.1.1. Supporting program access to data 309 14.1.2. Distributed applications .309 14.2. Database management systems (DBMSs) 310 14.2.1. The need for DBMSs 310 14.2.2. DBMS differences 311 14.2.3. Datamodels ; 311
14.2.3.1. The relational data model 311 14.2.3.2. The hierarchical data model 313 14.2.3.3. The Codasyl data model 314 14.2.4. Data manipulation languages 316 14.3. Network virtual data managers 318 14.3.1. NVDM desirability and structure .7. 320 14.3.2. Constructing the network-wide view of data 322 14.3.3. Data integrity 325 14.3.3.1. Controlling access .- 326 14.3.3.2. Maintaining meaning 327 14.3.3.3. Simplifying specification 328 14.3.4. XNDM - An Experimental Network Data Manager 328 14.4. Translation technology 330 14.4.1. Nature of the translation function 331 14.4.2. Translation alternatives : 333 14.4.3. The query translation process—an informal description 334 14.4.4. A taxonomy of major translation issues : 336 14.4.5. Implementation approach 338 14.4.6. XNQL translator specifics 339 14.5. Data transfer protocols (DTPs) 343 14.5.1. DTP services 344 14.5.2. Data and its translation/transformation 346 14.5.3. Implementing a data translator/transformer 347 14.5.4. A data transfer protocol 350 14.6. An e x a m p l e - t h e NBS experimental network operating system 352 14.6.1. XNOS overview 352 14.6.2. Supporting uniform system-system interactions 353 14.7. Parameter and data representation 357 14.7.1. Types 358 14.7.2. Binding 359 14.7.3. Encoding : 360 14.7.4. Conversion : 361 14.8. Debugging, testing and measurement 361 14.8.1. A remote debugger 363 14.8.2. Monitoring communication 364 14.8.3. Eventlogs ;. 365 14.9. Remote procedure calls : 365
CONTENTS Xi 14.9.1. The no-crash case 366 14.9.2. The crash case 368
15. Error recovery 371
Gerard LeLann, INRIA Projet Pilote Sinus
15.1. Introduction 371 15.2. Basic concepts and definitions 371 15.3. Error recovery 372 15.4. The Tandem/16 computing system 373 15.5. Sirius-Delta '. 374
15.5.1. Transaction commitment 374 15.5.2. Recovering from failures 375 15.5.3. Unrecoverable faults and failures 376
16. Hardware issues 377
Kenneth J. Thurber, Sperry Univac 4
'•=> 16.1. Introduction 377 16.2. Design issues 378 16.3. Executive control functions 378
16.3.1. Approach '.... 378 16.3.2. Hardware concept 382 16.3.3. Implicit primitives 390 16.3.4. Explicit primitives 391
16.4. Functionally Distributed Architectures (FDA) 395 16.5. Control overhead 396 16.6. Virtual Machines (VM) and Virtual Machine Monitors (VMM) 399 16.7. Local networks .". 405
16.7.1. Introduction : .405 16.7.2. Application embedded hardware 405 16.7.3. Tum key systems i .406 16.7.4. Subsystem building blocks .406 16.7.5. Components/modules .406 16.7.6. Chips : .406 16.7.7. Hardware overview .407 16.7.8. Network Systems Corporation HYPERchannel 408
16.8. Further issues 411
16.9. Conclusion 411
17. Hardware/software relationships in distributed systems 413
E. Douglas Jensen, Computer Science Department, Carnegie-Mellon University
17.1. Introduction 413 17.2. Assigning functionality to layers i 413 17.3. The implementation of functions within layers 415 17.4. Hardware/software relationships in distributed computer systems 416
17.4.1. Bus bandwidth .417 17.4.2. Bus medium .417 17.4.3. Broadcasts .418 17.4.4. Acknowledgment deferral 418 17.4.5. Transmission addressing .419 17.4.6. Communication support .419
xii CONTENTS
17.4.7. Bit/word/transmission synchronization 420
17.5. Conclusion 420
18. The National Software Works (NSW) 421
ElmarHoller, InstitutfurDatenverarbeitung, Kernforschungszentrum Karlsruhe518.1. Introduction , 421 18.2. System architecture 422 18.3. NSW components 425 18.3.1. MSG: The NSW interprocess communication facility 426 18.3.2. Front end: The NSW user interface 431 18.3.3. Foreman: providing the tool execution environment 433 18.3.4. File Package: The file handling facility for NSW 437 18.3.5. Works Manager: The NSW monitor .439 18.4. The NSW reliability concept 441 18.5. DAD: A debugging tool for debugging NSW 442
19. Ethernet, Pup and Violet 446
Butler W. Lampson, Xerox Palo Alto Research Center 619.1. The Alto and the Ethernet 446 19.1.1. Implementation .449 19.2. The Pup internetwork 450 19.2.1. Introduction .450 19.2.2. Design principles and issues 450 19.2.2.1. The basic model: networks connected with gateways 451 19.2.2.2. Simplicity 451 19.2.2.3. Datagrams versus virtual circuits .'. 451 19.2.2.4. Individual networks as packet transport mechanisms 452 19.2.2.5. Internetwork gateways '. 452 19.2.2.6. A layered hierarchy of protocols : ; 452 192.2.7. Naming, addressing, and routing 454 19.2.2.8. Flow control and congestion control 455 19.2.2.9. Reliable transport .456 19.2.2.10 Packet fragmentation 457 19.2.3. Implementation .457 19.2.3.1. Level 0: Packet transport - 457 19.2.3.2. Level 1: Internetwork datagrams 460 19.2.3.3. Level 2: Interprocess communication 464 19.2.3.4. Level 3: Application protocols 465 19.3. The distributed file system : 465 19.3.1. Introduction .465 19.3.2. Access mechanics 467 19.3.3. Client responsibilities 469 193.3.1. Server crashes and aborted transactions 469 19.3.3.2 Local caches of shared data :...470 19.3.3.3. Directories .472 19.3.4. Summary '. .473 19.4. Violet: A distributed, replicated calendar system ...:. 473 19.4.1. Introduction .473 19.4.2. Environment '. .474 19.4.3. System architecture ~.! .475 19.4.4. Replicated data :. .479 19.4.5. Sharing and locking 481 19.4.6. The performance of the architecture 482
CONTENTS XU1 19.4.7. Implementation notes .483 19.4.8. Conclusion .484