Effective run time management of parallelism in a functional programming context
Full text
(2) Submitted in fulfilment of the requirements for the degree of Doctor of Philosophy, August 2000. Revised December 2001 and March 2002.. School of Computing GPO Box 252-100, Hobart, Tasmania, AUSTRALIA 7001 Phone: +61 3 6226 2922 Fax: +61 3 6226 1824 Web: http://www.comp.utas.edu.au.
(3) Declaration. Declaration Chapter Three of this thesis is an expansion of material originally published in [Dermoudy 1999] from the Proceedings of the 6th Australasian Conference on Parallel and Real-Time Systems. Chapter Eight of this thesis is an expansion of material originally published in [Dermoudy 1996a] from the Proceedings of the 19th Australasian Computer Science Conference. This thesis contains no material which has been accepted for a degree or diploma by the University of Tasmania or any other institution — except by way of background information and duly acknowledged in the thesis — and to the best of my knowledge and belief no material previously published or written by another person except where due acknowledgment is made in the text of the thesis.. ...................................................................... Page i.
(4) Effective Runtime Management of Parallelism in a Functional Programming Context. This thesis may be made available for loan and limited copying in accordance with the Copyright Act 1968.. ...................................................................... The content remains © Julian Dermoudy, 2000–2002.. Page ii.
(5) Abstract. Abstract This thesis considers how to speed up the execution of functional programs using parallel execution, load distribution, and speculative evaluation. This is an important challenge given the increasing complexity of software systems, the decreasing cost of individual processors, and the appropriateness of the functional paradigm for parallelisation. Processor speeds are continuing to climb — but the magnitudes of increase are overridden by both the increasing complexity of software and the escalating expectation of users. Future gains in speed are likely to occur through the combination of today’s conventional uni-processors to form loosely-coupled multicomputers. Parallel program execution can theoretically provide linear speed-ups, but for this theoretical benefit to be realised two main hurdles must be overcome. The first of these is the identification and extraction of parallelism within the program to be executed. The second hurdle is the runtime management and scheduling of the parallel components to achieve the speed-up without slowing the execution of the program. Clearly a lot of work can be done by the programmer to ‘parallelise’ the algorithm. There is often, however, much parallelism available without significant effort on the part of the programmer. Functional programming languages and compilers have received much attention in the last decade for the contributions possible in parallel executions. Since the semantics of languages from the functional programming paradigm manifest the Church-Rosser property (that the order of evaluation of subPage iii.
(6) Effective Runtime Management of Parallelism in a Functional Programming Context. expressions does not affect the result), sub-expressions may be executed in parallel. The absence of side-effects and the lack of state facilitate the availability of expressions suitable for concurrent evaluation. Unfortunately, such expressions may involve varying amounts of computation or require high amounts of data — both of which complicate the management of parallel execution. If the future of computation is through the formation of multicomputers, we are faced with the high probability that the number of available processing units will quickly outweigh the known parallelism of an algorithm at any given moment during execution. Intuitively this spare processing power should be utilised if possible. The premise of speculative evaluation is that it employs otherwise idle tasks on work that may prove beneficial. The more program components available for execution the greater the opportunity for speculation and potentially the quicker the program’s result may be obtained. The second impediment for the parallel execution of programs is the scheduling of program components for evaluation. Multicomputer execution of a program involves the allocation of program components among the available tasks to maximise throughput. We present a decentralised, speculation-cognate, load distribution algorithm that allocates and manages the distribution of program components among the tasks with the co-aim of minimising the impact on tasks executing program components known to be required. In this dissertation we present our implementation of minimal-impact speculative evaluation in the context of the functional programming language Haskell augmented with a number of primitives for the indication of useful parallelism. We expound four (two quantitative and two qualitative) novel schemes for expressing the initial speculative contribution of program components and provide a translation mechanism to illustrate the equivalence of the four. The implementation is based on the Glasgow Haskell Compiler (GHC) version 0·29 — the de facto standard for parallel functional programming research — and strives to minimise the runtime overhead of managing speculative evaluation. We have augmented the Graph reduction for a Unified Machine model (GUM) runtime system with our load distribution algorithm and speculative evaluation sub-system. Both are. Page iv.
(7) Abstract. motivated by the need to facilitate speculative evaluation without adversely impacting on program components directly influencing the program’s result. Experiments have been undertaken using common benchmark programs. These programs have been executed under sequential, conservative parallel, and speculative parallel evaluation to study the overheads of the runtime system and to show the benefits of speculation. The results of the experiments conducted using an emulated multicomputer add evidence of the usefulness of speculative evaluation in general and effective speculative evaluation in particular.. Page v.
(8)
(9) Acknowledgments. Acknowledgments AMDG. There are many people I need to thank for their support — colleagues, friends, and relatives. To all those people at UTas who have helped and hindered: thanks! Thanks to fellow masochists Sam, Carl, Chris, Glenn, Mark, and Ian for the discussions and distractions. Thanks to colleagues Dave, John, Luke, Nicole, Pete, Andrea, Terry, David, Phil, and Kristy for your support and helpful advice. Special mentions to John for spending so much time on the vagaries of the installation of the Glorious Haskell Compiler, to Dave for his help with my daily battles with a certain software company, to Nicole for helping me debug my code, and to Kristy for her unique logic. Thanks to my supervisors:, Charles, Andrew, and particularly Vishv for their continued faith, perseverance, criticism, enthusiasm, and support. Thanks to my HoDs over the years: Young, John, Thong, Chris, Tony, and Arthur for their support and assistance. Thanks to statistician Simon Wotherspoon for his help with experimental design and statistical analysis. And, ‘thanks’ to all of my students for being so interested in my units that they prevented me from working on this! I also want to thank those people who have helped greatly with technical advice over the years: principally Hans-Wolfgang Loidl (University of Glasgow/Heriot-Watt University) without whom this would not have been completed, and also Kevin. Page vii.
(10) Effective Runtime Management of Parallelism in a Functional Programming Context. Hammond (St Andrew’s University), Phil Trinder (Heriot-Watt University), David King (The Open University/Heriot-Watt University), Simon Marlow (University of Glasgow/Microsoft Corporation), Simon Peyton Jones (University of Glasgow/Microsoft Corporation), Fernando Rubio (Universidad Complutense de Madrid), and Will Partain (University of Glasgow). A special thanks is also due to the anonymous markers of this document. This thesis is far from perfect. I am indebted to the markers for their acknowledgment of its successes, their bolstering of its strengths, their reduction of its weaknesses, and their grace in its flaws. There were many additional things I wanted to accomplish in this thesis and the University of Tasmania’s Research Higher Degrees Unit must be acknowledged for curtailing this list. I hope to complete more of the tasks in the future. Thanks to those who have helped with the production of this thesis, either through typing (my wife Nicole), through proof-reading (Nicole again, Vishv, Richard, Charles, and Sam), or through marking it (I don’t know who you are but I assure you the cheques are in the mail). Thanks also to my style consultants Nicole and Chris! Thanks to the fencers and the soccer players for the physical exhaustion, the Wizards of the Coast for their money-making enterprise, the brewers from Tasmania and Ireland, and the distillers from Scotland for their medicinal preventions of dehydration and insanity. Thanks to my close friends and my family. In particular, thanks to Mum, Dad, Joan, Gran, and to Kristen for their love, support and understanding. Unfortunately, there are also some people who did not see the fruition of this document — in particular my brother Simon, my maternal grand parents, and my paternal grand father. I miss you and know that you’re proud of me. Finally, I’d like to thank my wife Nicole for everything. I have mentioned that Hans Wolfgang’s assistance was crucial; Nicole’s was no less so. I am humbled by your selfless attitude and sacrifice. Thank you for your love, food, support, food, assistance, food, understanding, food, tolerance, food, and patience. I’m turning off the computer now Nic and this thesis document is for you. See you soon. -Julian. Page viii.
(11) Contents. Contents Chapter One: Introduction...................................................................... 1 1·1. Motivation..........................................................................................................1 1·1·1. The Problem ..............................................................................................1. 1·1·2. The Solution...............................................................................................2. 1·1·3. 1·1·2·1. Philosophy..................................................................................... 2. 1·1·2·2. Language........................................................................................ 2. 1·1·2·3. Architecture................................................................................... 3. Alternatives ................................................................................................4 1·1·3·1. Paradigm........................................................................................ 4. 1·1·3·2. Parallelism...................................................................................... 5. 1·2. Aims ....................................................................................................................7. 1·3. Overview ............................................................................................................7. 1·4. Structure of the Thesis .....................................................................................9. Chapter Two: Parallelism in Functional Programs ...............................11 2·1. The Functional Programming Paradigm .....................................................11 2·1·1. ‘Horses for Courses’ .............................................................................. 11. 2·1·2. Paradigms ................................................................................................ 11. 2·2. Inherent Parallelism ........................................................................................14. 2·3. Graph Reduction.............................................................................................15. 2·4. Software and Hardware Architectures .........................................................15 2·4·1. Processor and Memory Models ........................................................... 15 Page ix.
(12) Effective Runtime Management of Parallelism in a Functional Programming Context 2·4·1·1. SISD Machines ........................................................................... 16. 2·4·1·2. MISD Machines.......................................................................... 16. 2·4·1·3. SIMD Machines.......................................................................... 16. 2·4·1·4. MIMD Machines ........................................................................ 17. 2·4·2. Topology ..................................................................................................18. 2·4·3. Specialised Hardware .............................................................................20. 2·4·4. General-purpose Hardware...................................................................21. 2·5. Granularity .......................................................................................................22. 2·6. Specifying Parallelism.....................................................................................24. 2·7. 2·6·1. Motivation................................................................................................24. 2·6·2. Static Analysis..........................................................................................24. 2·6·3. Empirical Evidence and Profilers ........................................................25. 2·6·4. Granularity Simulators ...........................................................................26. 2·6·5. Annotations .............................................................................................26. Effectiveness....................................................................................................27. Chapter Three: Speculative Evaluation................................................. 31 3·1. Conservative Evaluation ................................................................................31. 3·2. Speculative Evaluation Issues .......................................................................32 3·2·1. Introduction.............................................................................................32. 3·2·2. Complexities ............................................................................................33. 3·2·3. 3·3. Competition ................................................................................ 33. 3·2·2·2. Change ......................................................................................... 34. 3·2·2·3. Premature Exceptions ............................................................... 35. 3·2·2·4. Communication .......................................................................... 35. Solutions...................................................................................................36. Priority Schemes..............................................................................................37 3·3·1. Preliminary Definitions..........................................................................37. 3·3·2. Quantitative Schemes.............................................................................39. 3·3·3. Page x. 3·2·2·1. 3·3·2·1. Probability ................................................................................... 39. 3·3·2·2. Percentiles ................................................................................... 41. Qualitative Schemes ...............................................................................41 3·3·3·1. Black−White................................................................................ 41. 3·3·3·2. Levels of Speculation................................................................. 43. 3·3·4. Equivalencies...........................................................................................44. 3·3·5. Effectiveness ...........................................................................................46.
(13) Contents. Chapter Four: Load Distribution .......................................................... 47 4·1. Discrimination.................................................................................................47. 4·2. Mechanisms .....................................................................................................48 4·2·1. Mechanism Taxonomy.......................................................................... 48. 4·2·2. Static and Dynamic Load Distribution Algorithms.......................... 48. 4·2·3. Centralised and Decentralised Load Distribution Algorithms........ 49. 4·2·4. Cooperative and Non-Cooperative Load Distribution Algorithms............................................................................................... 51. 4·2·5. 4·3. 4·4. 4·5. Thread placement and Thread migration ........................................... 51. Mode of Operation.........................................................................................52 4·3·1. Operating Mode Enumeration............................................................. 52. 4·3·2. Sender-Initiated Load Distribution Algorithms ................................ 52. 4·3·3. Receiver-Initiated Load Distribution Algorithms ............................. 52. 4·3·4. Symmetrically-Initiated Load Distribution Algorithms.................... 53. 4·3·5. Adaptive Load Distribution Algorithms ............................................ 53. Constituent Policies ........................................................................................53 4·4·1. Introduction ............................................................................................ 53. 4·4·2. Load Estimation Policies ...................................................................... 54. 4·4·3. Information Policies .............................................................................. 54. 4·4·4. Initiation Policies.................................................................................... 56. 4·4·5. Location Policies .................................................................................... 56. 4·4·6. Selection Policies .................................................................................... 57. 4·4·7. Migration-Limiting Policies .................................................................. 57. 4·4·8. Transfer Policies..................................................................................... 58. Effectiveness....................................................................................................59. Chapter Five: Haskell & GHC .............................................................. 63 5·1. Introduction.....................................................................................................63. 5·2. Haskell ..............................................................................................................63. 5·3. 5·2·1. Origins ..................................................................................................... 63. 5·2·2. GPH ......................................................................................................... 64. GHC .................................................................................................................65 5·3·1. Introduction ............................................................................................ 65. 5·3·2. Compiler Structure................................................................................. 66. 5·3·3. Compiler Phases..................................................................................... 66. 5·3·4. Runtime Architecture ............................................................................ 68. Page xi.
(14) Effective Runtime Management of Parallelism in a Functional Programming Context. 5·3·5. 5·4. GUM.........................................................................................................68 5·3·5·1. Introduction ................................................................................ 68. 5·3·5·2. Operation .................................................................................... 70. 5·3·5·3. Thread Pool Structure and Operation..................................... 72. 5·3·5·4. Closure (and TSO) Structure and Composition .................... 73. 5·3·5·5. Annotations................................................................................. 75. 5·3·5·6. Spark Pool Structure and Operation ....................................... 75. 5·3·5·7. Spark Structure ........................................................................... 77. 5·3·5·8. Load Distribution....................................................................... 77. 5·3·5·9. Remote Behaviour...................................................................... 78. PVM..................................................................................................................81. Chapter Six: Related Work ....................................................................83 6·1. Introduction.....................................................................................................83. 6·2. Machine Models ..............................................................................................83 6·2·1. LML, the G-Machine, and the <ν,G>-Machine ...............................83. 6·2·2. The ALICE..............................................................................................84. 6·2·3. The ZAPP................................................................................................84. 6·2·4. The Four-Stroke Reduction Engine ....................................................85. 6·2·5. The TIM...................................................................................................85. 6·2·6. The MaRS ................................................................................................86. 6·2·7. Alfalfa and Buckwheat ...........................................................................86. 6·2·8. COBWEB and Norman ........................................................................87. 6·2·9. Roe’s Contribution .................................................................................88. 6·2·10 GAML ......................................................................................................89 6·2·11 The ABC and PABC Machines ............................................................89 6·2·12 The HDG-Machine................................................................................90 6·2·13 The Spineless G-Machine......................................................................91 6·2·14 The Spineless-Tagless G-Machine .......................................................91 6·2·15 GRIP.........................................................................................................92 6·2·16 Partridge’s Contribution ........................................................................92 6·2·17 The ν-STG Machine ..............................................................................93 6·2·18 Mattson’s Contribution..........................................................................93 6·2·19 The WYBERT Machine ........................................................................94 6·2·20 The mSTG Machine...............................................................................95 6·2·21 DREAM...................................................................................................96 6·2·22 GranSim ...................................................................................................97. Page xii.
(15) Contents. 6·2·23 Others ...................................................................................................... 97. 6·3. Parallelism Identification and Speculative Evaluation...............................98 6·3·1. Para-Functional Programming............................................................. 98. 6·3·2. Burton’s Contribution ........................................................................... 99. 6·3·3. Lazy Task Creation, Futures, and Transitive Priorities .................. 100. 6·3·4. Serial Combinators............................................................................... 102. 6·3·5. Evaluation Transformers .................................................................... 102. 6·3·6. Concurrent Clean ................................................................................. 103. 6·3·7. Partridge’s Contribution...................................................................... 104. 6·3·8. GRIP, the Spineless-Tagless G-Machine, GUM, and GPH ......... 105. 6·3·9. Mattson’s Contribution ....................................................................... 106. 6·3·10 Murthy and Rajaraman’s Contribution ............................................. 107 6·3·11 GranSim................................................................................................. 108 6·3·12 Evaluation Strategies............................................................................ 108 6·3·13 Eden and PEARL ................................................................................ 109. 6·4. Dynamic Load Distribution ....................................................................... 110 6·4·1. Casavant and Kuhl’s Taxonomy........................................................ 110. 6·4·2. Eager, Lazowska, and Zahorjan’s Contribution.............................. 110. 6·4·3. Bidding................................................................................................... 112. 6·4·4. Diffusion Scheduling ........................................................................... 113. 6·4·5. Stankovic’s Contribution .................................................................... 115. 6·4·6. Barak and Shiloh’s Contribution........................................................ 116. 6·4·7. Krueger and Livny’s Contribution .................................................... 116. 6·4·8. The PAM ............................................................................................... 117. 6·4·9. The HDG-Machine ............................................................................. 117. 6·4·10 Partridge’s Contribution...................................................................... 117 6·4·11 Suen and Wong’s Contribution.......................................................... 118 6·4·12 Aharoni, Feitelson, Barak and Farber’s Contribution .................... 118 6·4·13 Mattson’s Contribution ....................................................................... 119 6·4·14 Distributed Operating Systems.......................................................... 119 6·4·14·1 Milieu..........................................................................................119 6·4·14·2 Wisdom......................................................................................120 6·4·14·3 Sprite ..........................................................................................121 6·4·14·4 Utopia ........................................................................................121 6·4·14·5 Amoeba......................................................................................121 6·4·14·6 Condor.......................................................................................122. 6·4·15 Others .................................................................................................... 122. Page xiii.
(16) Effective Runtime Management of Parallelism in a Functional Programming Context. 6·5. Summary.........................................................................................................124. Chapter Seven: Effective Management of Speculative Evaluation ..... 125 7·1. Introduction...................................................................................................125. 7·2. Annotations ...................................................................................................128. 7·3. The Spark Pool..............................................................................................131. 7·4. 7·5. 7·3·1. Storage Overhead .................................................................................131. 7·3·2. Selection .................................................................................................132. 7·3·3. Banishing................................................................................................133. 7·3·4. Names.....................................................................................................133. 7·3·5. Thread Families.....................................................................................134. 7·3·6. Parent Threads ......................................................................................135. The Thread Pool ...........................................................................................135 7·4·1. Storage ....................................................................................................135. 7·4·2. Selection .................................................................................................136. 7·4·3. Names.....................................................................................................138. 7·4·4. Thread Families.....................................................................................138. 7·4·5. Parent Threads and Closure Entry.....................................................138 7·4·5·1. Preamble.................................................................................... 138. 7·4·5·2. Parent-Child Relationship Identification .............................. 139. 7·4·5·3. Priority Reckoning ................................................................... 139. 7·4·5·4. Thread Death............................................................................ 141. Scheduling ......................................................................................................142 7·5·1. Introduction...........................................................................................142. 7·5·2. Pre-emptive versus Non-Pre-emptive...............................................143. 7·5·3. First-In-First-Out..................................................................................143. 7·5·4. Last-In-First-Out ..................................................................................144. 7·5·5. Thread Families.....................................................................................145. 7·6. Discussion......................................................................................................146. 7·7. Summary.........................................................................................................148. Chapter Eight: Effective Implementation of Speculative Evaluation..151. Page xiv. 8·1. Introduction...................................................................................................151. 8·2. Parent Threads and Child Threads.............................................................152 8·2·1. Introduction...........................................................................................152. 8·2·2. Spark Parent Identification..................................................................152. 8·2·3. Thread Parent Identification...............................................................153.
(17) Contents. 8·2·4. 8·3. Thread Priority Calculation and Adjustment................................... 154 8·2·4·1. Preamble ....................................................................................154. 8·2·4·2. Sparking .....................................................................................154. 8·2·4·3. Changing a Thread’s Priority ..................................................155. 8·2·4·4. Thread Termination.................................................................157. 8·2·5. Dynamic Thread Hierarchy Management ........................................ 158. 8·2·6. Messages ................................................................................................ 160 8·2·6·1. Preamble ....................................................................................160. 8·2·6·2. The CHILD Message ...............................................................160. 8·2·6·3. The PRIORITY Message.......................................................160. 8·2·6·4. The PARENT Message ............................................................161. 8·2·6·5. The THREAD_GA Message ....................................................161. 8·2·6·6. The ZOMBIE and TSO_DEATH Messages..........................161. Implementation Details............................................................................... 170 8·3·1. Preamble ................................................................................................ 170. 8·3·2. Data Structures ..................................................................................... 170. 8·3·3. 8·3·2·1. Annotations...............................................................................170. 8·3·2·2. Names ........................................................................................170. 8·3·2·3. Sparks.........................................................................................170. 8·3·2·4. TSOs ..........................................................................................171. 8·3·2·5. Closures .....................................................................................171. 8·3·2·6. Dynamic Thread Hierarchy Management.............................171. 8·3·2·7. Messages ....................................................................................171. Algorithms............................................................................................. 172 8·3·3·1. Annotations...............................................................................172. 8·3·3·2. Names ........................................................................................172. 8·3·3·3. Sparks and Spark Pools ...........................................................172. 8·3·3·4. TSOs, Threads, and the Thread Pool....................................173. 8·3·3·5. Dynamic Thread Hierarchy Management.............................174. 8·3·3·6. Messages ....................................................................................174. 8·4. Discussion ..................................................................................................... 174. 8·5. Summary........................................................................................................ 177. Chapter Nine: Effective Load Distribution — Spark Percolation .......179 9·1. Introduction.................................................................................................. 179. 9·2. Deficiencies of “Fishing”............................................................................ 179. 9·3. Spark Percolation ......................................................................................... 181 9·3·1. Overview ............................................................................................... 181. 9·3·2. Objective ............................................................................................... 182 Page xv.
(18) Effective Runtime Management of Parallelism in a Functional Programming Context. 9·3·3. 9·3·3·1. Load Estimation ....................................................................... 182. 9·3·3·2. Load Information..................................................................... 183. 9·3·4. Operation ...............................................................................................184. 9·3·5. Message Types.......................................................................................185 9·3·5·1. Preliminaries.............................................................................. 185. 9·3·5·2. LOAD REQUEST .................................................................... 185. 9·3·5·3. LOAD STATUS....................................................................... 186. 9·3·5·4. DON’T DISTURB ................................................................. 186. 9·3·5·5. SPARK REQUEST ................................................................. 186. 9·3·5·6. SPARK....................................................................................... 187. 9·3·5·7. NEGATIVE ACKNOWLEDGMENT ...................................... 187. 9·3·6. Maintaining Load Information ...........................................................187. 9·3·7. Policy Summary ....................................................................................189. 9·3·8. Improvements .......................................................................................189. 9·3·9. 9·4. Load Metric ...........................................................................................182. 9·3·8·1. Throttling DON’T DISTURB Messages............................. 189. 9·3·8·2. Resolving Local Maxima ......................................................... 190. Algorithm...............................................................................................194 9·3·9·1. Context ...................................................................................... 194. 9·3·9·2. Initiation .................................................................................... 194. 9·3·9·3. Communication ........................................................................ 196. Examples........................................................................................................204 9·4·1. Introduction...........................................................................................204. 9·4·2. Mandatory–Speculative Transition ....................................................205. 9·4·3. Transfer of a Mandatory Spark...........................................................205. 9·4·4. Transfer of a Speculative Spark..........................................................206. 9·4·5. Transfer of a Speculative Spark with Competition .........................207. 9·4·6. Thread Re-Stocking..............................................................................208. 9·4·7. Inverted Thread Re-Stocking..............................................................210. 9·4·8. ‘Single-Depth’ Local Maxima on Mesh Topology...........................211. 9·4·9. ‘Double-Depth’ Local Maxima on Bus Topology...........................215. 9·4·10 ‘Double-Depth’ Local Maxima on Mesh Topology........................218 9·4·11 Heavy Load............................................................................................224. 9·5. Spark Banishing.............................................................................................226. 9·6. Implementation.............................................................................................226. 9·7 Page xvi. 9·6·1. Data Structures......................................................................................226. 9·6·2. Algorithm...............................................................................................227. Discussion......................................................................................................228.
(19) Contents. 9·8. Summary........................................................................................................ 229. Chapter Ten: Experimentation ............................................................231 10·1. Experimental Design ................................................................................... 231 10·1·1 Experimental Aims .............................................................................. 231 10·1·2 Test Programs....................................................................................... 231 10·1·3 Methodology ......................................................................................... 233 10·1·4 Garbage Collection .............................................................................. 235 10·1·5 Normalisation ....................................................................................... 238 10·1·5·1 Overview ...................................................................................238 10·1·5·2 Sequential Execution ...............................................................240 10·1·5·3 Parallel Execution.....................................................................241 10·1·5·4 Consolidation............................................................................244. 10·1·6 Results Presentation............................................................................. 246. 10·2. Sequential Test Program Results ............................................................... 247 10·2·1 ansi...................................................................................................... 247 10·2·1·1 Introduction ..............................................................................247 10·2·1·2 Collected Data ..........................................................................247 10·2·1·3 Sequential Performance Accuracy .........................................249 10·2·1·4 Parallel Performance ................................................................249 10·2·1·5 Overall Scheme Comparison ..................................................254 10·2·1·6 Speed-Up Approximation .......................................................254 10·2·1·7 Observations .............................................................................254. 10·2·2 eliza................................................................................................... 255 10·2·2·1 Introduction ..............................................................................255 10·2·2·2 Collected Data ..........................................................................255 10·2·2·3 Sequential Performance Accuracy .........................................257 10·2·2·4 Parallel Performance ................................................................257 10·2·2·5 Overall Scheme Comparison ..................................................262 10·2·2·6 Speed-Up Approximation .......................................................262 10·2·2·7 Observations .............................................................................262. 10·2·3 primes ................................................................................................ 263 10·2·3·1 Introduction ..............................................................................263 10·2·3·2 Collected Data ..........................................................................263 10·2·3·3 Sequential Performance Accuracy .........................................265 10·2·3·4 Parallel Performance ................................................................265 10·2·3·5 Overall Scheme Comparison ..................................................270 10·2·3·6 Speed-Up Approximation .......................................................270 10·2·3·7 Observations .............................................................................270. Page xvii.
(20) Effective Runtime Management of Parallelism in a Functional Programming Context. 10·2·4 queens.................................................................................................271 10·2·4·1 Introduction .............................................................................. 271 10·2·4·2 Collected Data .......................................................................... 271 10·2·4·3 Sequential Performance Accuracy ......................................... 273 10·2·4·4 Parallel Performance................................................................ 273 10·2·4·5 Overall Scheme Comparison .................................................. 278 10·2·4·6 Speed-Up Approximation ....................................................... 278 10·2·4·7 Observations ............................................................................. 278. 10·2·5 parser.................................................................................................279 10·2·5·1 Introduction .............................................................................. 279 10·2·5·2 Collected Data .......................................................................... 279 10·2·5·3 Sequential Performance Accuracy ......................................... 281 10·2·5·4 Parallel Performance................................................................ 281 10·2·5·5 Overall Scheme Comparison .................................................. 286 10·2·5·6 Speed-Up Approximation ....................................................... 286 10·2·5·7 Observations ............................................................................. 286. 10·2·6 veritas..............................................................................................287 10·2·6·1 Introduction .............................................................................. 287 10·2·6·2 Collected Data .......................................................................... 287 10·2·6·3 Sequential Performance Accuracy ......................................... 289 10·2·6·4 Parallel Performance................................................................ 289 10·2·6·5 Overall Scheme Comparison .................................................. 294 10·2·6·6 Speed-Up Approximation ....................................................... 294 10·2·6·7 Observations ............................................................................. 294. 10·3. Parallel Test Program Results .....................................................................295 10·3·1 parfact..............................................................................................295 10·3·1·1 Introduction .............................................................................. 295 10·3·1·2 Collected Data .......................................................................... 295 10·3·1·3 Sequential Performance Accuracy ......................................... 297 10·3·1·4 Parallel Performance................................................................ 297 10·3·1·5 Overall Scheme Comparison .................................................. 302 10·3·1·6 Speed-Up Approximation ....................................................... 302 10·3·1·7 Observations ............................................................................. 302. 10·3·2 parfib.................................................................................................303 10·3·2·1 Introduction .............................................................................. 303 10·3·2·2 Collected Data .......................................................................... 303 10·3·2·3 Sequential Performance Accuracy ......................................... 305 10·3·2·4 Parallel Performance................................................................ 305 10·3·2·5 Overall Scheme Comparison .................................................. 310 10·3·2·6 Speed-Up Approximation ....................................................... 310. Page xviii.
(21) Contents 10·3·2·7 Observations .............................................................................310. 10·3·3 prsa...................................................................................................... 311 10·3·3·1 Introduction ..............................................................................311 10·3·3·2 Collected Data ..........................................................................311 10·3·3·3 Sequential Performance Accuracy .........................................313 10·3·3·4 Parallel Performance ................................................................313 10·3·3·5 Overall Scheme Comparison ..................................................318 10·3·3·6 Speed-Up Approximation .......................................................318 10·3·3·7 Observations .............................................................................318. 10·3·4 soda...................................................................................................... 319 10·3·4·1 Introduction ..............................................................................319 10·3·4·2 Collected Data ..........................................................................319 10·3·4·3 Sequential Performance Accuracy .........................................321 10·3·4·4 Parallel Performance ................................................................321 10·3·4·5 Overall Scheme Comparison ..................................................326 10·3·4·6 Speed-Up Approximation .......................................................326 10·3·4·7 Observations .............................................................................326. 10·4. ‘Sequentialised’ Parallel Test Program Results......................................... 327 10·4·1 parfact ............................................................................................. 327 10·4·1·1 Introduction ..............................................................................327 10·4·1·2 Collected Data ..........................................................................327 10·4·1·3 Sequential Performance Accuracy .........................................329 10·4·1·4 Parallel Performance ................................................................329 10·4·1·5 Overall Scheme Comparison ..................................................334 10·4·1·6 Speed-Up Approximation .......................................................334 10·4·1·7 Observations .............................................................................334. 10·4·2 parfib ................................................................................................ 335 10·4·2·1 Introduction ..............................................................................335 10·4·2·2 Collected Data ..........................................................................335 10·4·2·3 Sequential Performance Accuracy .........................................337 10·4·2·4 Parallel Performance ................................................................337 10·4·2·5 Overall Scheme Comparison ..................................................342 10·4·2·6 Speed-Up Approximation .......................................................342 10·4·2·7 Observations .............................................................................342. 10·4·3 prsa...................................................................................................... 343 10·4·3·1 Introduction ..............................................................................343 10·3·3·2 Collected Data ..........................................................................343 10·4·3·3 Sequential Performance Accuracy .........................................345 10·4·3·4 Parallel Performance ................................................................345 10·4·3·5 Overall Scheme Comparison ..................................................350. Page xix.
(22) Effective Runtime Management of Parallelism in a Functional Programming Context 10·4·3·6 Speed-Up Approximation ....................................................... 350 10·4·3·7 Observations ............................................................................. 350. 10·4·4 soda ......................................................................................................351 10·4·4·1 Introduction .............................................................................. 351 10·4·4·2 Collected Data .......................................................................... 351 10·4·4·3 Sequential Performance Accuracy ......................................... 353 10·4·4·4 Parallel Performance................................................................ 353 10·4·4·5 Overall Scheme Comparison .................................................. 358 10·4·4·6 Speed-Up Approximation ....................................................... 358 10·4·4·7 Observations ............................................................................. 358. 10·5. Discussion......................................................................................................359. 10·6. Summary.........................................................................................................368. Chapter Eleven: Effective Runtime Management of Parallelism ....... 371 11·1. Summary.........................................................................................................371 11·1·1 Holistic View .........................................................................................371 11·1·2 Identifying Parallelism..........................................................................373 11·1·3 Speculative Evaluation.........................................................................374 11·1·4 Load Distribution .................................................................................375. 11·2. Contributions.................................................................................................376. 11·3. Further Work.................................................................................................377 11·3·1 Thread Migration ..................................................................................377 11·3·2 Change in Testbed Architecture.........................................................378 11·3·3 Granularity and Data Locality Analysis.............................................378 11·3·4 Evaluation Strategies ............................................................................379 11·3·5 Detailed Analysis...................................................................................379 11·3·6 Re-Application to Java .........................................................................379. Chapter Twelve: References ................................................................ 381 Appendix One: Collected Data from Experiments with Unaltered Test Programs ..................................................................................... 413. Page xx. A1·1. Introduction...................................................................................................413. A1·2. ansi ..............................................................................................................414. A1·3. eliza............................................................................................................415. A1·4. primes.........................................................................................................416. A1·5. queens.........................................................................................................417.
(23) Contents. A1·6. parser........................................................................................................ 418. A1·7. veritas ..................................................................................................... 419. A1·8. parfact ..................................................................................................... 420. A1·9. parfib........................................................................................................ 421. A1·10 prsa ............................................................................................................. 422 A1·11 soda ............................................................................................................. 423. Appendix Two: Collected Data from Experiments with ‘Sequentialised’ Test Programs........................................................... 425 A2·1. Introduction.................................................................................................. 425. A2·2. parfact ..................................................................................................... 426. A2·3. parfib........................................................................................................ 427. A2·4. prsa ............................................................................................................. 428. A2·5. soda ............................................................................................................. 429. Page xxi.
(24)
(25) List of Figures. List of Figures Figure 1·1: An example function.............................................................. 6 Figure 2·1: Common topologies...............................................................19 Figure 3·1: Example functional code. .....................................................37 Figure 3·2: Example functional code......................................................39 Figure 3·3: Example functional code......................................................42 Figure 4·1: Taxonometric key for load distribution algorithm classification.49 Figure 5·1: Compilation phases of GHC. ..............................................67 Figure 5·2: ‘Hardware’ architecture for parallel program execution on a multiprocessor. ......................................................................68. Figure 5·3: ‘Hardware’ architecture for parallel program execution on a network of computers. ...........................................................69. Figure 5·4: Closure structure..................................................................73 Figure 5·5: Fixed header decomposition..................................................74 Figure 5·6: The initial structure of an info table. ....................................74 Figure 7·1: The simplified main runtime system cycle illustrating the context of sparking, scheduling, and evaluation................................127. Figure 7·2: The process of sparking......................................................130 Figure 7·3: Example functional code....................................................139 Page xxiii.
(26) Effective Runtime Management of Parallelism in a Functional Programming Context. Figure 7·4: Thread hierarchy................................................................140 Figure 8·1 (a): The detailed process of sparking (including parent and priority identification)..........................................................156. Figure 8·1 (b): The detailed process of sparking (including parent and priority identification)..........................................................157. Figure 8·2: Closure entry behaviour......................................................159 Figure 8·3 (a): Closure update behaviour.............................................163 Figure 8·3 (b): Closure update behaviour. ...........................................164 Figure 8·3 (c): Closure update behaviour.............................................165 Figure 8·3 (d): Closure update behaviour. ...........................................166 Figure 8·4 (a): Thread termination behaviour......................................167 Figure 8·4 (b): Thread termination behaviour. ....................................168 Figure 8·4 (c): Thread termination behaviour......................................169 Figure 9·1: A point-to-point topology neighbourhood boundary. .............183 Figure 9·2: A PVM bus topology neighbourhood boundary..................184 Figure 9·3: A pathological scenario. .....................................................190 Figure 9·4: A secondary-panic pathological scenario..............................193 Figure 9·5: A pathological scenario on a bus network...........................194 Figure 9·6: The main runtime system cycle (including spark percolation). ........................................................................195. Figure 9·7 (a): The spark percolation algorithm — initialisation step..197 Figure 9·7 (b): The spark percolation algorithm — infinite iterative step.....................................................................................197. Figure 9·8 (a): The message processing cycle (including spark percolation).201 Figure 9·8 (b): The message processing cycle (including spark percolation). ........................................................................202. Figure 9·8 (c): The message processing cycle (including spark percolation). ........................................................................203. Figure 9·9 (a): Before spark percolation communication. ......................204 Page xxiv.
(27) List of Figures. Figure 9·9 (b): After spark percolation communication. ......................205 Figure 9·10 (a): Before spark percolation communication. ....................206 Figure 9·10 (b): After spark percolation communication......................206 Figure 9·11 (a): Before spark percolation communication. ....................207 Figure 9·11 (b): After spark percolation communication......................207 Figure 9·12 (a): Before spark percolation communication. ....................208 Figure 9·12 (b): After spark percolation communication......................208 Figure 9·13 (a): Before spark percolation communication. ....................209 Figure 9·13 (b): After spark percolation communication......................209 Figure 9·14 (a): Before spark percolation communication. ....................210 Figure 9·14 (b): After spark percolation communication......................210 Figure 9·15 (a): Initial situation. ........................................................211 Figure 9·15 (b): Extent of central processing element’s neighbourhood..212 Figure 9·15 (c): Extent of central processing element’s neighbour’s neighbourhoods. ..................................................................212. Figure 9·15 (d): Extent of outlying processing element’s. neighbourhoods.. ..........................................................................................213. Figure 9·15 (e): After resolution of panic. ...........................................214 Figure 9·16 (a): A pathological scenario..............................................216 Figure 9·16 (b): A pathological scenario including neighbourhood detail..................................................................................216. Figure 9·16 (c): A pathological scenario resolved. ................................218 Figure 9·17 (a): Initial situation. ........................................................218 Figure 9·17 (b): The beginnings of panic. ............................................219 Figure 9·17 (c): The continuation of panic. .........................................220 Figure 9·17 (d): The resumption of panic. ...........................................221 Figure 9·17 (e): The final status of the network. .................................225 Figure 10·1: Abstract main parallel runtime system cycle. .....................239 Figure 10·2: Abstract main cycle..........................................................240 Page xxv.
(28) Effective Runtime Management of Parallelism in a Functional Programming Context. Figure 10·3: Box-whisker plot for sequential ansi. ...........................249 Figure 10·4: Box-whisker plot for original GUM ansi. ...................249 Figure 10·5: Box-whisker plot for initialisation and execution components for original GUM ansi. .................................................250. Figure 10·6: Cumulative bar chart of runtime components for original GUM ansi....................................................................250. Figure 10·7: Box-whisker plot for prioritised GUM ansi.................251 Figure 10·8: Box-whisker plot for initialisation and execution components for prioritised GUM ansi...............................................251. Figure 10·9: Cumulative bar chart of runtime components for prioritised GUM ansi....................................................................252. Figure 10·10: Box-whisker plot for prioritised GUM with spark percolation ansi. .............................................................252. Figure 10·11: Box-whisker plot for initialisation and execution components for prioritised GUM with spark percolation ansi............253. Figure 10·12: Cumulative bar chart of runtime components for prioritised GUM with spark percolation ansi. ................................253. Figure 10·13: Execution times for ansi under the four schemes. ........254 Figure 10·14: Speed-up values for ansi under the three parallel schemes.254 Figure 10·15: Box-whisker plot for sequential eliza........................257 Figure 10·16: Box-whisker plot for original GUM eliza................257 Figure 10·17: Box-whisker plot for initialisation and execution components for original GUM eliza................................................258. Figure 10·18: Cumulative bar chart of runtime components for original GUM eliza. ................................................................258. Figure 10·19: Box-whisker plot for prioritised GUM eliza. ...........259 Figure 10·20: Box-whisker plot for initialisation and execution components for prioritised GUM eliza............................................259. Page xxvi.
(29) List of Figures. Figure 10·21: Cumulative bar chart of runtime components for prioritised GUM eliza.................................................................260. Figure 10·22: Box-whisker plot for prioritised GUM with spark percolation eliza. ..........................................................260. Figure 10·23: Box-whisker plot for initialisation and execution components for prioritised GUM with spark percolation eliza.........261. Figure 10·24: Cumulative bar chart of runtime components for prioritised GUM with spark percolation eliza. .............................261. Figure 10·25: Execution times for eliza under the four schemes. .....262 Figure 10·26: Speed-up values for eliza under the three parallel schemes...............................................................................262. Figure 10·27: Box-whisker plot for sequential primes. ...................265 Figure 10·28: Box-whisker plot for original GUM primes. ...........265 Figure 10·29: Box-whisker plot for initialisation and execution components for original GUM primes.............................................266. Figure 10·30: Cumulative bar chart of runtime components for original GUM primes. .............................................................266. Figure 10·31: Box-whisker plot for prioritised GUM primes. ........267 Figure 10·32: Box-whisker plot for initialisation and execution components for prioritised GUM primes.........................................267. Figure 10·33: Cumulative bar chart of runtime components for prioritised GUM primes. .............................................................268. Figure 10·34: Box-whisker plot for prioritised GUM with spark percolation primes. .......................................................268. Figure 10·35: Box-whisker plot for initialisation and execution components for prioritised GUM with spark percolation primes. .....269. Figure 10·36: Cumulative bar chart of runtime components for prioritised GUM with spark percolation primes. ..........................269. Figure 10·37: Execution times for primes under the four schemes. ..270 Page xxvii.
(30) Effective Runtime Management of Parallelism in a Functional Programming Context. Figure 10·38: Speed-up values for primes under the three parallel schemes...............................................................................270. Figure 10·39: Box-whisker plot for sequential queens. ...................273 Figure 10·40: Box-whisker plot for original GUM queens.............273 Figure 10·41: Box-whisker plot for initialisation and execution components for original GUM queens.............................................274. Figure 10·42: Cumulative bar chart of runtime components for original GUM queens. .............................................................274. Figure 10·43: Box-whisker plot for prioritised GUM queens.........275 Figure 10·44: Box-whisker plot for initialisation and execution components for prioritised GUM queens. ........................................275. Figure 10·45: Cumulative bar chart of runtime components for prioritised GUM queens. .............................................................276. Figure 10·46: Box-whisker plot for prioritised GUM with spark percolation queens.........................................................276. Figure 10·47: Box-whisker plot for initialisation and execution components for prioritised GUM with spark percolation queens. .....277. Figure 10·48: Cumulative bar chart of runtime components for prioritised GUM with spark percolation queens............................277. Figure 10·49: Execution times for queens under the four schemes. ..278 Figure 10·50: Speed-up values for queens under the three parallel schemes...............................................................................278. Figure 10·51: Box-whisker plot for sequential parser.....................281 Figure 10·52: Box-whisker plot for original GUM parser.............281 Figure 10·53: Box-whisker plot for initialisation and execution components for original GUM parser.............................................282. Figure 10·54: Cumulative bar chart of runtime components for original GUM parser. .............................................................282. Figure 10·55: Box-whisker plot for prioritised GUM parser.........283 Page xxviii.
Figure
Related documents
existing flow LEGEND weather information Maintenance and Construction Management City of Pittsfield Maintenance Garage + City of Springfield Maintenance Garage + Local City/Town DPW
The fossils come from different sites in the municipalities of Ibirá (axis and fi bula) and Monte Alto (ilium and ischia), São Paulo State, from Maastrichtian beds of the São José
Lawyers, administrators, judges on the High Commercial Court, and officials in the Bankruptcy Supervision Agency, while voicing some reservations, declared them- selves satisfied
Service overview Host Trust Commissioning body Inclusion criteria Exclusion criteria Referral process.. Date information provided for
This attack will be monitored to detect its signature using a network monitoring tool, and this signature will then be used to create a rule which will trigger an alert in
These isolates consisted of 21 Foc isolates representative of the different geographic areas, the Cavendish cultivars grown in South Africa and the 24 VCG tester isolates..
In 2002, Emergency Management Australia (EMA) published Planning Safer Communities: land use planning for natural hazards, as part of its Australian Emergency
We reach this conclusion with the help of a large randomized field experiment that provided Finnish high school students accurate information about the earnings