Input data for experiments - Preliminary discussion

6.8 Preliminary discussion

6.8.3 Input data for experiments

In this section, we outline the process of acquiring the input data used for each application. Our input data consists of both synthetic and real data or a combination of both. In some applications the size of the input data is limited by the capabilities of our hosts. Note that all input data to all the applications are stored in les in secondary storage and re-used on each of the host systems.

DPS. The input data for our DPS application is wholly synthetic. The number of machines is dened as an integer value from the set M = {16, 32, 64, 128, 256}. The number of jobs is also an integer value dened in the set J = {100, 200, 400, 800, 1600, 3200}. The input data for the lookup tables CE, CT , PE and PT consist of integers generated

uniformly at random based on the intervals shown in Table 6.2.

The CE and CT lookup tables are generated once for each i ∈ M. This means that we

have a total of 10 data instances, 5 each for CE and CT . For example, instance CE16

consists of a lookup table with 16 rows and 16 columns corresponding to communication energy costs among 16 machines, in both directions.

Minimum Value Maximum Value

Communication time, CT 1 2

Communication energy, CE 2 10

Processing time, PT 1 4

Processing energy, PE 4 30

Table 6.2: Intervals used in data generation for DPS. These intervals are inclusive in the resulting data.

A similar approach is used for PE and PT . Again, the lookup tables are generated once

for each possible combination of (i ∈ M, j ∈ J ). Therefore, for instance, PE16,100 is

a 16x100 lookup table showing the processing energy costs of 100 jobs on 16 machines. Therefore, we have a total of 60 instances, 30 each for PE and PT .

The job data instances is simply a list of integers representing the job deadlines. Each deadline is generated such that there is some reasonable gap between consecutive deadlines. To achieve this reasonable gap between deadlines, dene three constants - com-

munication frame (Fc), processing frame (Fp) and window. Communication frame is

derived from the maximum possible communication time, in this case, it has a value of

2 (Table6.2). Similarly, processing frame is derived from the maximum processing time

plus 1, hence, it has a value of 5. The window is the summation of Fc and Fp. With

these parameters, the deadline for a given job, Jj for j > 0, is generated uniformly at

random in {Imin, Imin + window}, where Imin = deadlinej−1+ Fc. For the rst job,

J0, this interval is dened as {Fc+ 2, window + 2}. Finally the deadline of the last job,

Jn, is always dened as deadlinen−1 rounded up to the nearest integer divisible by 256.

This is so that we can always have a value divisible by the number of work-items when computing a solution on the GPU since, for NVIDIA GPUs with OpenCL 1.0, the total number of work-items must be divisible by the work-group size.

GapsMis. The input data consists of a combination of synthetic (derived from processing

real data) and real data obtained from GenBank FTP [78], which contains sequence

databases in ASN.1 format. The length of the target sequences are xed at 250 while the length of the query sequences can be selected from four congurations - 75, 100, 150 and 200. In order to get the desired length for our input sequences, real sequences sequences are sampled and processed. For instance, in order to get an input sequence of 75 characters, a real sequence containing more than 75 characters is chosen uniformly at random from the database and characters are deleted from random positions until we are left with a sequence with 75 characters. Some real input data contain thousands of characters so it is possible to re-use the same sequence to generate multiple synthetic

Chapter 6. Parallel Algorithms for Heterogeneous Systems with GPGPUs 114 databases used. The databases selected, when combined, provides us with enough data to generate our input data.

Name of database

file Number ofsequences

Length of longest sequence Length of shortest sequence gbbct10.fsa_aa 151,777 16,990 100 gbbct11.fsa_aa 172,113 14,474 100 gbbct24.fsa_aa 164,027 13,362 100

Table 6.3: Information for GenBank databases used to generate input sequences for GapsMis.

The substitution matrix used is the BLOSUM62 matrix [77] for aligning protein se-

quences. A gap open penalty and gap extension penalty of 10.0 and 0.5, respectively, were used for all executions of the experiments. Finally, the experiments were conducted for an alignment that allowed for 2 gaps and then repeated for 3 gaps.

Velvet. The input data is a collection of particles. Each particle is characterized by a mass, position and velocity. The position and velocity properties are 3-dimensional vector quantities while the mass is a scalar quantity. The values that make up the velocity vector are generated using a uniform distribution based on minimum and maximum values. For the experiments, we use a minimum value of −10.00 and maximum of 10.00 for the uniform distribution. All values are real numbers.

The starting positions of the particles are obtained using a method for generating uni-

formly distributed random points within an n-ball [70]. To achieve this, for each particle,

we generate a 3-dimensional vector consisting of real numbers in (0, 1). Next we calcu- late the radius for the position vector. Suppose the position is given by the vector ~

p = (x, y, z), the radius, r, is computed as r = px2_{+ y}2_{+ z}2. The position on the

surface of the n-ball is given by 1

r· ~p. The nal position of a particle within the n-ball is

given by u1

n · ~p. In our simulations n = 3 which gives an ordinary ball.

The problem sizes include ensembles of 2048, 4096, 8192, 16384, 32768 and 65536 particles and the simulation is repeated for each size. For all devices, the integrator is set to run for 10 iterations in total.

FDGV. The input data for the graphs consists of a combination of synthetic and real data.

The real data graphs are available from Stanford Network Analysis Platform (SNAP) [63].

Name of dataset file ID Vertices Edges

p2p-Gnutella08 GNUT 1 6,301 20,777

p2p-Gnutella24 GNUT 2 26,518 65,369

Table 6.4: Details of the real graph data obtained from SNAP.

Our synthetic graph data consists of three types of graphs which are complete graphs, grid graphs and trees. The complete graphs and grid graphs were generated by the application

while the igraph[44] tool was used to generate the tree graphs. The properties of the

graphs used as inputs are listed in Table 6.5.

Type of graph ID Vertices Edges

Complete COMP 1 100 4,950 COMP 2 200 19,900 COMP 3 400 79,800 Grid GRID 1 10,000 19,800 GRID 2 20,000 39,700 GRID 3 40,000 79,500 Tree TREE 1 10,000 9,999 TREE 2 20,000 19,999 TREE 3 40,000 39,999

Table 6.5: Details of the synthetic graph data generated for FDGV application.

The three graph types are chosen to represent three distinct cases with respect to the number of vertices and edges that make up each graph type. The complete graph is characterized as having an edge count that outnumbers the number of vertices in the graph. The number of vertices and edges in a tree graph is almost equal. And for the grid graph, the number of edges is almost twice as much as the number of vertices. The combination of the real and synthetic graph data provides us with a considerable amount of data variety for testing out our FDGV application. For our runs the application is set to complete 10 iterations of the graph layout process.

In document A Study of Time and Energy Efficient Algorithms for Parallel and Heterogeneous Computing (Page 129-132)