• No results found

Loop Parallelization

N/A
N/A
Protected

Academic year: 2021

Share "Loop Parallelization"

Copied!
15
0
0

Loading.... (view fulltext now)

Full text

(1)

© 2000 bei Prof. Dr. Uwe Kastens

Loop Parallelization

Compilation steps:

nested loops operating on arrays, sequentiell execution of iteration space

analyze data dependencies

data-flow: definition and use of array elements

transform loops

keep data dependencies intact parallelize inner loop(s)

map onto field or vector of processors

map arrays onto processors such that many acceses are local, transform index spaces

DECLARE B[0..N,0..N+1] FOR I := 1 ..N FOR J := 1 .. I B[I,J] := B[I-1,J]+B[I-1,J-1] END FOR END FOR N 1 1 N i j 1-N 1 N -1 i j 000000000000000 000000000000000 000000000000000 000000000000000 000000000000000 000000000000000 000000000000000 000000000000000 000000000000000 000000000000000 000000000000000 000000000000000 000000000000000 000000000000000 000000000000000 000000000000000 000000000000000 000000000000000 000000000000000 000000000000000 000000000000000 000000000000000 000000000000000 000000000000000 000000000000000 000000000000000 000000000000000 000000000000000 111111111111111 111111111111111 111111111111111 111111111111111 111111111111111 111111111111111 111111111111111 111111111111111 111111111111111 111111111111111 111111111111111 111111111111111 111111111111111 111111111111111 111111111111111 111111111111111 111111111111111 111111111111111 111111111111111 111111111111111 111111111111111 111111111111111 111111111111111 111111111111111 111111111111111 111111111111111 111111111111111 111111111111111 1 N 1 -N -1 i N j

Vorlesung Übersetzer II SS 2000 / Folie 52

Ziele:

Overview

in der Vorlesung:

Explain

• Application area: scientific computations

• goals: execute inner loops in parallel with efficient data access • transformation steps

(2)

© 2000 bei Prof. Dr. Uwe Kastens

Iteration Space of Nested Loops

Iteration space of n properly nested loops:

n-dimensional space of integral points (polytope)

each point (i1, ..., in) of that space represents an execution of the innermost loop body

loop bounds are not known before run-time iteration space is not necessarily orthogonal iteration space is sequentially enumerated

DECLARE B[0..N,0..N] FOR I := 1 .. N FOR J := 1 .. I B[I,J] := B[I-1,J]+B[I-1,J-1] END FOR END FOR Example:

Computation of Pascal´s triangle

N 1 1 N i j

Vorlesung Übersetzer II SS 2000 / Folie 53

Ziele:

Notion of iteration space

in der Vorlesung:

• Use the example for explanation • Show execution order of iteration points

• Stepsize greater than 1 causes unused points in the iteration space: non-convex polytope

Verständnisfragen:

(3)

© 2000 bei Prof. Dr. Uwe Kastens

Data Dependencies in Iteration Spaces

Data dependency from iteration point i1 to i2:

Iteration i1 computes a value that is used in iteration i2 (flow dependency) relative dependence vector d = i2 - i1 = (i21 - i11, ..., i2n - i1n)

holds for all iteration points except at the border

Flow-dependencies can not be directed against the execution order, can not point backward in time:

each dependence vector must be lexicographically positive, i. e. d = (0, ..., 0, di, ...), di > 0

DECLARE B[0..N,0..N] FOR I := 1 .. N FOR J := 1 .. I B[I,J] := B[I-1,J]+B[I-1,J-1] END FOR END FOR Example:

Computation of Pascal´s triangle

N 1 1 N i j

Vorlesung Übersetzer II SS 2000 / Folie 54

Ziele:

Understand dependencies in loops

in der Vorlesung:

Explain

• Vektor representation of dependencies • show examples

• show admissable directions graphically

Verständnisfragen:

(4)

© 2000 bei Prof. Dr. Uwe Kastens

Loop Transformation

The iteration space of a loop nest is transformed onto new coordinates. Goals:

execute innermost loop(s) in parallel improve locality of data accesses;

in space: storage of executing processor, in time: reuse of values stored in cache

systolic computation and communication scheme Data dependencies must point forward in time,

i.e. lexicographically positive and not within parallel dimensions

3 linear fundamental transformations:

Reversal: flip execution order for one dimension Permutation: exchange two loops of the loop nest

Skewing: add iteration count of an outer loop to that of an inner one

non-linear transformations, e. g.

Scaling: stretch the iteration space in one dimension, causes gaps Tiling: introduce additional inner loops that cover tiles of fixed size

Vorlesung Übersetzer II SS 2000 / Folie 55

Ziele:

Overview

in der Vorlesung:

• Explain the goals

• admissable directions of dependencies • Show diagrams for the transformations

(5)

© 2000 bei Prof. Dr. Uwe Kastens

Reversal

Iteration count of one loop is negated, that dimension is enumerated backward

Transformation matrix:

(

-1

0

1

0

)

*

( )

=

(

)

=

(

)

i

j

-i

j

ir

jr

loop variables old new

( )

1 1 -1 1 1 ... ... 0 0

2-dimensional:

for i = 0 to M for j = 0 to N ... for ir = -M to 0 for jr = 0 to N ... j i M N jr ir -M N original transformed

Vorlesung Übersetzer II SS 2000 / Folie 56

Ziele:

Understand reversal transformation

in der Vorlesung:

• Explain the effect of reversal transformation. • Explain the notation of the transformation matrix.

• there may be no dependencies in the direction of the reversed loop - they would point barckward after the transformation.

• Show an example where reversal enables loop fusion.

Verständnisfragen:

(6)

© 2000 bei Prof. Dr. Uwe Kastens

Skewing

The iteration count of an outer loop is added to the count of an inner loop;

the iteration space is shifted; the execution order of iteration points remains unchanged

(

1

f

1

0

)

*

( )

=

(

)

=

(

)

i

j

i

f*i+j

is

js

loop variables old new

( )

1 1 1 1 1 ... ... 0 0

2-dimensional:

for i = 0 to M for j = 0 to N ... for is = 0 to M

for js = f*is to N+f*is ... j i M N original transformed Transformation matrix: f js is M N N+M

Vorlesung Übersetzer II SS 2000 / Folie 57

Ziele:

Understand skewing transformation

in der Vorlesung:

• Explain the effect of skewing transformation. • Skewing is always applicable

• Skewing can enable loop permutation

Verständnisfragen:

(7)

© 2000 bei Prof. Dr. Uwe Kastens

Permutation

Two loops of the loop nest are interchanged; the iteration space is flipped;

the execution order of iteration points changes

(

0

1

0

1

)

*

( )

=

(

)

=

(

)

i

j

j

i

ip

jp

loop variables old new

( )

1 1 0 1 1 ... 0 0 0

2-dimensional:

for i = 0 to M for j = 0 to N ... for ip = 0 to N for jp = 0 to M ... j i M N original transformed Transformation matrix: 1 jp ip N M 1

Vorlesung Übersetzer II SS 2000 / Folie 58

Ziele:

Understand loop permutation

in der Vorlesung:

• Explain the effect of loop permutation

• Permutation often yields a parallelizable innermost loop.

Verständnisfragen:

(8)

© 2000 bei Prof. Dr. Uwe Kastens

Use of Transformation Matrices

Transformation matrix T defines new iteration counts in terms of the old ones: T* i = i´

Transformation matrix T transforms old dependency vectors into new ones: T* d = d´

inverse Transformation matrix T-1defines old iteration counts in terms of new ones, for transformation of index expressions in the loop body: T-1* i = i´

concatenation of transformations first T1 then T2 : T2 * T1 = T

(

-1

0

1

0

)

*

( )

=

(

)

=

(

)

i

j

-i

j

i’

j’

e. g. Reversal

(

-1

0

1

0

)

*

( )

=

(

)

1

1

-1

1

e. g.

(

-1

0

1

0

)

*

( )

=

(

)

e. g.

i’

j’

-i’

j’

=

( )

i

j

(

-1

)

1

0

0

e. g.

(

*

0

0

1

1

)

=

(

0

0

1

-1

)

Vorlesung Übersetzer II SS 2000 / Folie 59

Ziele:

Learn how to use the transformation matrices

in der Vorlesung:

• explain the 4 uses with examples • transform a loop completely

Verständnisfragen:

• Why do the dependence vectors change under a transformation, although the dependence between array elements remains unchanged?

(9)

© 2000 bei Prof. Dr. Uwe Kastens

Example for Transformation and Parallelization of a Loop

for i = 0 to N for j = 0 to M

a[i, j] = (a[i, j-1] + a[i-1, j]) / 2;

Parallelize the obove loop. 1. Draw the iteration space.

2. Compute the dependence vectors and draw examples of them into the iteration space. Why can the inner loop not be executed in parallel?

3. Apply a skewing transformation and draw the iteration space. 4. Apply the permutation transformation and draw the iteration space.

Explain why the inner loop now can be executed in parallel. 5. Compute the matrix of the composed transformation and

use it to transform the dependence vectors.

6. Compute the inverse of the transformation matrix and use it to transform the index expressions.

7. Write the complete loops with new loop variables ip and jp and new loop bounds.

Vorlesung Übersetzer II SS 2000 / Folie 60

Ziele:

Exercise the method with an example

in der Vorlesung:

• Explain the steps of the transformation. • Solution on C-61

Verständnisfragen:

(10)

© 2000 bei Prof. Dr. Uwe Kastens

Solution of the Transformation and Parallelization Example

2. A dependence in direction of the parallel dimension j is not allowed. 4. Both dependence vectors point forward in ip direction.

7. for ip = 0 to M+N

for jp = max (0, ip-M) to min (ip, N)

a[jp, ip-jp] = (a[jp, ip-jp-1] + a[jp-1, ip-jp]) / 2;

M=4 N=7 M=4 M=4 M+N N=7 M+N N=7

(

1 1

)

1 0

( )

0 1

( )

1 0 =

(

1 1

)

1 0

( )

1 0

( )

1 1 = i j jp ip

(

0 1

)

1 -1 Inverse

Vorlesung Übersetzer II SS 2000 / Folie 61

Ziele:

Solution for C-60

in der Vorlesung:

Explain

• the bounds of the iteration spaces, • the dependence vectors,

• the transformation matrix and its inverse, • the conditions for being parallelizable, • the transformation of the index expressions.

Verständnisfragen:

(11)

© 2000 bei Prof. Dr. Uwe Kastens

Inequalities Describe Loop Bounds

The bounds of a loop nest are described by a set of linear inequalities.

Each inequality separates the space in „inside and outside of the iteration space“:

positive (negative) factors represent upper (lower) bounds

(

)

*

(

)

(

)

-1

1

0

0

0

0

-1

1

i

j

0

M

0

N

B * i

c

1 2 3 4 1 2 3 4

(

)

*

(

)

(

)

-1

1

0

0

1

0

-1

1

i

j

0

M

0

N

1 2 3 4 N M 1 2 3 4 N M examp 1 examp 2

Vorlesung Übersetzer II SS 2000 / Folie 61a

Ziele:

Understand representation of bounds

in der Vorlesung:

• Explain matrix notation • Explain graphic interpretation

• There can be arbitrary many inequalities

Verständnisfragen:

(12)

© 2000 bei Prof. Dr. Uwe Kastens

Transformation of Loop Bounds

The inverse of a transformation matrix T-1 transforms a set of inequalities: B * T-1 i’ c

)

(

( )

*

i’

j’

0

M

0

N

1 2 3 4

)

(

1 1 0 1

(

)

1 -1 0 1 skewing inverse

( )

-1

1

0

0

0

0

-1

1

*

(

-11

)

0 1

( )

-1

1

1

-1

0

0

-1

1

( )

-1

1

1

-1

0

0

-1

1

B T-1 B * T-1 B * T-1 i’ c

examp 1

1 2 3 4 N M

new bounds:

Vorlesung Übersetzer II SS 2000 / Folie 61b

Ziele:

Understand the transformation of bounds

in der Vorlesung:

• Explain how the inequalities are transformed

Verständnisfragen:

(13)

Transformation and Parallelization

N 1 1 N i j 1-N 1 N -1 i j Iteration space original transformed DECLARE B[0..N,0..N] FOR I := 1 .. N FOR J := 1 .. I B[I,J] := B[I-1,J]+B[I-1,J-1] END FOR END FOR DECLARE B[0..N,0..N] FOR IS := 1 .. N FOR JS := 1-IS .. 0 B[IS,JS+IS] := B[IS-1,JS+IS]+B[IS-1,JS-1+IS] END FOR END FOR sequential time is

parallel prozessor mapping js mod 2

(i, j) -> (i, j-i) = (is, js)

is js

Vorlesung Übersetzer II SS 2000 / Folie 62

Ziele:

Example for parallelization

in der Vorlesung:

• Explain skewing transformation. • Inner loop in parallel.

• Explain the time and processor mapping.

• mod 2 folds the arbitrary large loop dimension on a fixed number of 2 processors.

Verständnisfragen:

• Give the matrix of this transformation.

(14)

© 2000 bei Prof. Dr. Uwe Kastens

Data Mapping

Goal: Distribute array elements over processors, such that

as many as possible accesses are local.

Index space of an array: n-dimensional space of integral index points (polytop)

same properties as iteration space same mathematic model

same transformations are applicable (Skewing, Reversal, Permutation, ...) no restriktions by data dependencies

Vorlesung Übersetzer II SS 2000 / Folie 63

Ziele:

reuse model of iteration spaces

in der Vorlesung:

Explain with examples of index spaces

Verständnisfragen:

(15)

Data Distribution for Parallel Loops

000000000000000 000000000000000 000000000000000 000000000000000 000000000000000 000000000000000 000000000000000 000000000000000 000000000000000 000000000000000 000000000000000 000000000000000 000000000000000 000000000000000 000000000000000 000000000000000 000000000000000 000000000000000 000000000000000 000000000000000 000000000000000 000000000000000 000000000000000 000000000000000 000000000000000 000000000000000 000000000000000 000000000000000 111111111111111 111111111111111 111111111111111 111111111111111 111111111111111 111111111111111 111111111111111 111111111111111 111111111111111 111111111111111 111111111111111 111111111111111 111111111111111 111111111111111 111111111111111 111111111111111 111111111111111 111111111111111 111111111111111 111111111111111 111111111111111 111111111111111 111111111111111 111111111111111 111111111111111 111111111111111 111111111111111 111111111111111 1 N 1 -N -1 i N j N 1 N 1 i j DECLARE B[0..N,0..N] FOR I := 1 .. N FOR J := 1-I.. 0 B[I,J+I] := B[I-1,J+I]+B[I-1,J-1+I] END FOR END FOR DECLARE B[0..N,-N..N] ... B[I,J] := B[I-1,J-1]+B[I-1,J] Index space original transformed skewing (i,j) -> (i,j-i) P0 wr itesB[I,J+I] Data on P0 50% local 100%local

Vorlesung Übersetzer II SS 2000 / Folie 64

Ziele:

See the effect of index transformation

in der Vorlesung:

• Explain local and non-local accesses. • Explain index transformation. • Demonstrate improved locality. • Skewing causes unused storage.

Verständnisfragen:

References

Related documents

This fact, coupled with the AIDS epidemic, has led to women being left behind as single par- ents when their children are still young. The case of 34-year-old Veronica (not her

Considering the estimates of mode of inheritance and heterosis effect compared to better parent analysis showed that for efficient progress in SYP and TSW

Since individual larvae numbers did not differ between fish-inhabited and fishless ponds, yet the number of samples and odonate diversities did, the results support higher eveness

From the Study Page, you can access the Nesstar online analysis system, to run analysis of the data you are interested in online. To analyse data, simply click on the link

Another example is the lack of adequate structures for issuers in the market that generates low levels of transactions and capitalization, existing classes of investors for which

The notion of an event’s development is borrowed from Landman’s (1992) analysis of the English progressive, according to which this aspect denotes a function from a set

� 1. At a minimum, get the confirmation by LC/MS/MS of THC-COOH, even if it will always be called “THC”. Look at the ratio.. What you CAN’T do with a urine test –

Implementing Google Analytics on WebPAC – The HKBU