1 INTRODUCTION
1.1 Background
1.1.3 Execution Platforms
As mentioned previously, the usual goals for parallelizing a simulation involve increasing model fidelity, integration of multiple simulators, and/or execution runtime reduction. In order to achieve these goals, PDES simulations are typically run on high performance computing (HPC) systems while recent wide-area coarse-grained task parallel simulations utilize resources afforded by the distributed nature of the Internet. Both of these distinct, yet related distributed computing paradigms are described next.
1.1.3.1 Tightly Coupled Resources
Cluster computing is the cornerstone of traditional high performance parallel and distributed computing infrastructures. Typical low-cost cluster computing systems are commercial off the shelf (COTS) machines with multiple processors or single package multi-core processors. Networking these systems together using low cost gigabit
Ethernet provides good bandwidth and relatively low latencies for parallel and distributed simulation. These COTS clusters provide a balance between total cost and performance and are widely deployed.
Tightly coupled cluster computing systems trade low cost for higher performance than their COTS counterparts. These systems include faster and perhaps specialized
processors and large bandwidth, ultra-low latency switching fabrics from Infiniband, Quadrics, and Myrinet for example. Fine-grained PDES simulations can benefit greatly from these types of HPC systems that can reduce the latency in synchronization such as conservative barrier mechanisms and optimistic GVT calculations in addition to enabling faster messaging rates.
The very high end of HPC platforms involves supercomputing infrastructures. Some supercomputing facilities are simply constructed by scaling tightly coupled cluster computing systems, while others are custom designed from the ground up to include specialized processors, interconnects, and software including middleware tools and operating systems such as the IBM BlueGene system [42]. It is no surprise that these systems offer the highest potential performance and scalability for PDES codes [43]. Novel techniques have been applied for codes that are applicable to stream processors such as those found on general purpose graphics processing units (GPGPU) that are able to provide supercomputing-like performance with only a handful of graphics cards [44- 46].
1.1.3.2 Loosely Coupled Resources
Grid computing through web services involves linking together various resources from different organizations and institutions to form a metacomputing platform [47]. These resources can be clusters, supercomputers, and even desktop computers [48]. The Globus Toolkit provides a standard set of services to create these kinds of systems [49]. There has been work in federating distributed simulations utilizing the High Level Architecture (HLA) over grids with IDSim [50] and SOHR [51-53]. Other related works include web-based simulation [54, 55], the Extensible Modeling and Simulation
Framework (XMSF) for web services [56], object request broker (ORB) based frameworks [57, 58], and a framework for Time Warp on grids [59].
Both PDES and task parallel codes are well-suited to run on traditional tightly coupled HPC infrastructures. There may not be as much need for coarse-grained task parallel executions and simulations for high bandwidth and low latency interconnects, however, fast processors and large memory pools are not a detriment to task parallelism.
A practical limitation of HPC infrastructures concerns their availability. Although grid systems alleviate some of the availability problem, access and restricted execution still exist. Allocated time on these systems is limited and is generally as not widely accessible as the loosely coupled metacomputing infrastructures described next.
1.1.3.3 Loosely Coupled Resources and Metacomputing
Recent advances in Internet-scale distributed computing have transformed the scope of certain computational work loads that, in the past, were reserved for large super- computing facilities and clusters of high-powered, dedicated machines. These wide-area distributed computing infrastructures are commonly referred to as public resource or volunteer computing platforms. Machines from all over the world form a virtual single super-resource offering computational capacity at the discretion of the user. Figure 5 illustrates a typical program lifecycle run on a volunteer computing framework.
Figure 5: Volunteer Computing Task Life Cycle
Users may direct computations to only occur when their computers are idle (e.g., when the screensaver is active) or perhaps users may direct the client to perform
computation at all times using one or more of the processing cores available on their system. Many of these projects have been discussed as task parallel executions or simulators such as distributed.net, SETI@home, Folding@home, World Community Grid, SZTAKI [60], and ClimatePrediction.net. Many of these task parallel volunteer computing projects are enabled by the Berkeley Open Infrastructure for Network Computing (BOINC) middleware software [61]. Other software solutions for volunteer computing include XtremeWeb [62], Unicorn [63], InteGrade [64], Harmony [65], DIRAC [66], and Xgrid [67]. These volunteer computing systems have offered very high computational throughput, rivaling even the fastest supercomputers [68]. BOINC-
enabled projects have a combined throughput typically exceeding 1 petaflop [69] while
2. Program split into tasks 1. Program submission 3. Clients request work 4. Clients return completed work 5. Results are collected
6. Results are retrieved and viewed
the Folding@home is the fastest distributed computing resource in the world exceeding 4 petaflops and over 350,000 active CPUs [70] mainly due to the large pool of stream processors including GPUs and the Sony PlayStation®3.
The emergence of grid and web services has allowed organizations and businesses to pool computational resources together to service workloads using existing
infrastructure and machines that may not have originally been designated for providing processor cycles for these workloads such as desktop machines and laptops. Desktop grid computing is similar to the public resource computing paradigm with processor scavenging and utilizing idle-cycles, however, the scale, level of trust and implied
security are different [71, 72]. These desktop grids provide cost savings for organizations by enhancing computational capacity for additional workloads or accommodating larger workloads not possible with the current employed infrastructure. These systems are typically on an institutional or organizational scale with implicit trust. In contrast to volunteer computing systems with wide-area applicability in both system compatibility and deployment, organizational desktop grids provide relatively higher speed
interconnects and accompanying bandwidth for computational tasks. These desktop grids follow the same tradition of processor scavenging like volunteer computing projects but provide additional benefits such as reduced or eliminated need to replicate work for validation to counter Byzantine failures such as intentionally corrupt and falsified returns of results. With volunteer computing, results must always be verified and checked for accuracy to protect against be misbehaving clients, either unintentionally through hardware faults or intentionally through result falsification for expedited credit. In desktop grid infrastructures, varying levels of assumptions can be made about the quality
of results that are returned as there may be a higher level of trust and maintenance of hardware and the users of the systems. This type of computing potential has been widely researched, the most notable middleware tools and services are Parallel Virtual Machine (PVM) [73] and Condor [74]. Other frameworks include Entropia [75], and a
master/worker variant of Condor named Condor-MW [76]. Task parallel simulation replication work on a network of non-dedicated resources with varying workloads is described in [77].
Both wide-area volunteer computing and desktop grids have certain disadvantages such as limits on types of applications that can be deployed. These infrastructures are almost exclusively tailored for task parallel simulations and executions and due to the public nature of most of these projects, simulations with sensitive data cannot be deployed. However, for the particular workloads for which these infrastructures are intended, they perform well if properly managed [78].