• No results found

Embarrassingly Parallel Codes

1 INTRODUCTION

1.1 Background

1.1.2 Embarrassingly Parallel Codes

In contrast to PDES, many parallel and distributed programs involve problems that fall into the embarrassingly parallel (EP) class of computational codes. EP programs are typically classified by their lack of dependency between partitions of work (e.g., non- existent interprocess communication) and trivial partitioning of the problem into parts. Given sufficient resources, these programs offer nearly linear speedups because of these advantages. 1

lim

( )

s N

f N

T

− →∞

=

(1.2)

When an unlimited number of processors are available, Amdahl’s law can be expressed as the inverse of the non-parallelizable portion of a parallel and distributed program as shown in equation (1.2). EP codes typically have non-existent interprocess or

inter-partition communication thus the value of TS is close to zero yielding very large

values for f(N).

An example EP application is a simulation based on the Monte Carlo method. A simple Monte Carlo method utilizes random numbers generated from probability

distributions that closely resembles existing data that are processed through a

mathematical model. Results are then aggregated to produce a final answer. For instance, the following Monte Carlo problem attempts to estimate π given the volume of a specific sphere with a radius of 2 centered at the origin:

2 2 2

4

x

+

y

+

z

=

(1.3)

A computer program can generate uniformly random values for x, y, and z. These three random values can be evaluated using equation (1.3). If the three-dimensional point falls within the sphere space, the result of the Monte Carlo trial is considered a “hit.”

(

)

3 3 4 # 3 # r hits valid trials r r π ≈ × (1.4) 6 # # hits valid trials

π

≈ × (1.5)

Equation (1.4) represents the proportional relationship between known equations of the volume of a sphere and the volume of cube to the Monte Carlo hits and Monte Carlo trials that fall within the volume of cube. Solving for π yields equation (1.5).

Monte Carlo methods such as these hit-and-miss scenarios are embarrassingly parallel as trials can be computed independently using any number of threads or processors without the need for synchronization. To produce an answer, all that is required is an aggregation of results at the end and the evaluation of equation (1.5). Lack

of dependency and communication during computation allow EP codes such as Monte Carlo simulations to be highly scalable and massively distributed in nature.

Task parallelism involves programs that are run in parallel to produce multiple replications or achieve large speedups over sequential executions that are repeated one after another. At the most fundamental level, these codes can be classified as an EP style of work distribution and computation. Task parallelism can be fine- or coarse-grained. Fine-grained task parallelism consists of distributing a portion of the program to other available processors such as parallelizing the execution of data-independent for-loops in programs like that of a matrix multiplication problem. Several libraries exist to ease implementation on the programming language level such as OpenMP [32],

Microsoft’s .Net Task Parallel Library (TPL) [33] and the MATLAB Parallel Computing Toolbox [34]. Coarse-grained task parallel programs are those where large portions or complete programs are tasked to processors independently. For instance, in modeling hurricane predictions, a set of variables that affect hurricane trajectory are tested. For each set of modifications to environmental variables, tasks can be created and run independently on separate machines. This allows the gathering of different trajectory data in parallel without having to run each task one after the other. The limiting factors to this approach are the amount of computational resources available and per-node memory and disk capacities. The following is a brief survey of notable and popular wide-area coarse-grained task parallel executions and simulations.

The Great Internet Mersenne Prime Search (GIMPS) is the earliest known wide- area coarse-grained task parallel execution project delivered over the Internet [35]. This project attempts to discover large Mersenne prime numbers (2n - 1). Since these numbers

are very large and grow exponentially, these tasks are computational intense but can be run independently from other users. Distributed.net is similar to GIMPS, but with different goals [36]. Distributed.net is known for distributing RSA Securities key

cracking challenges in which the entire key space is searched using brute force methods. The key space can be trivially partitioned with non-existent data dependencies.

SETI@home (Search for Extra-Terrestrial Intelligence) is an extremely popular task parallel execution that has been widely distributed, and is widely regarded as the first wide-area computational effort to gain traction in the general public [37]. This project partitions radio telescope data into frequency-independent portions that can be leased to individual volunteers and run with high concurrency as there is no need for

synchronization among each partition.

Folding@home is a popular task parallel simulator dedicated to modeling and simulation of protein folding and discovering issues when proteins do not fold correctly leading to diseases [38]. The humanitarian effort of the project has garnered widespread appeal while being as easily accessible as task parallel executions. This effort has

become so popular that ports of the software have been made to run on stream processors including graphics cards and home entertainment consoles such as the Sony

PlayStation®3 [39]. The World Community Grid is a meta project performing simulations on a variety of humanitarian-focused programs such as human proteome folding and identifying potential drug candidates with the FightAIDS@home project [40]. ClimatePrediction.net distributes different climate simulation models to clients to predict future climate patterns and changes [41].

EP codes, with the distinguishing characteristic of no interprocess communication and messaging are performed on a range of different distributed computing

infrastructures. In contrast, PDES programs are most often exclusively performed on computational resources that are tightly coupled for maximum performance. The following section details various execution platforms used for both PDES and EP codes.