• No results found

Rule-Based Program Transformation for Hybrid Architectures CSW Workshop Towards Portable Libraries for Hybrid Systems

N/A
N/A
Protected

Academic year: 2021

Share "Rule-Based Program Transformation for Hybrid Architectures CSW Workshop Towards Portable Libraries for Hybrid Systems"

Copied!
22
0
0

Loading.... (view fulltext now)

Full text

(1)

M. Carro1,2, S. Tamarit2, G. Vigueras1, J. Mari ˜no2

(2)

General Observation

• Cost: develop and maintain code adequate for hybrid architecture(s).

Every sub-platform different approaches / needs.

• Libraries: unified API, code adapted / optimized for some architecture.

• But:

1 Optimizationsacrosslibrary boundaries difficult.

2 Maintaining code for several platforms costly. 3 Porting among platforms costly.

• What is the right balance?

• We take the extreme position:

Optimize all you can (source code needed). As automatically as possible.

(3)

Overview

Source-to-source transformationof procedural code.

• Semantics-preserving.

• Modifying certain non-functional characteristics: speed, number of (FP) instructions, cache hits, communication, placement of kernels, . . .

• Performed by applyingtransformation rules.

Transformation rules:

• Match fragments to transform.

• Identify what they should be transformed into.

• Generate new code when certainconditionsare met.

Conditionscaptured withannotations:

• Algorithmic structure of code.

(4)

Transformation Scheme

GPGPU (OpenCL) Translated code Open MP MPI FPGA (MaxJ) Readycode Initial code Preparation Translation

(5)

Transformation Scheme

GPGPU (OpenCL) Translated code Open MP MPI FPGA (MaxJ) Readycode Initial code Preparation Translation • Generic C−→generic C. • Sructural changes. • Platform-independent.

• With acode shapeadequate for some target platform.

(6)

Transformation Scheme

GPGPU (OpenCL) Translated code Open MP MPI FPGA (MaxJ) Readycode Initial code Preparation Translation

• Rewrites generic code to different target architectures.

• Uses procedural-level information.

(7)

Running Example: RGB Filter

(8)

Running Example: RGB Filter

void kernelRedFilter(...) {

for (j = 0; j < width; j += 3 )

// Generate one row of the output image }

void rgbImageFilter(...){

for (i = 0; i < height; i++) kernelRedFilter(...); // Same for green and blue } int main(...) { // Read image rgbImageFilter(...); // Write images }

(9)

Annotated Code

#pragma polca kernel opencl #pragma polca map image redImage

for (i = 0; i < height; i++)

kernelRedFilter(image, i, rawWidth, redImage);

• Hint: farming out to a GPGPU.

• Information regarding target architecture: either from programmer or from analysis of architecture / task graph.

• Annotations: what does the code block do?

(10)

Information from Annotations

What

map

tells us

#pragma polca map image redImage

for (i = 0; i < height; i++)

kernelRedFilter(image, i, rawWidth, redImage);

• Inputimage, outputredImage.

• For everyimage[i],redImage[i]is produced using onlyimage[i].

• No global variables, no dependencies across iterations.

• Computation ofredImage[i]enclosed insidekernelRedFilter().

⇒ Pragma as summary of simpler properties.

(11)

www.imdea.org

Transformation rules

• Match input code, check conditions, generate output.

• Focused onproceduralproperties / structures.

(Simplified) Transformation Rule: Simplify Loops

one_iteration_active { pattern: {

for (cexpr(i) = cexpr(ini);

cexpr(i) < cexpr(n); cexpr(i)++)

if (cexpr(i) == cexpr(other)) cstmt(one_stat); } generate: { subs(cstmt(one_stat),cexpr(i),cexpr(other)); } }c = v[k];

(12)

www.imdea.org

Transformation rules

• Match input code, check conditions, generate output.

• Focused onproceduralproperties / structures.

(Simplified) Transformation Rule: Simplify Loops

one_iteration_active { pattern: {

for (cexpr(i) = cexpr(ini);

cexpr(i) < cexpr(n); cexpr(i)++)

if (cexpr(i) == cexpr(other)) cstmt(one_stat); } generate: { subs(cstmt(one_stat),cexpr(i),cexpr(other)); } } for (i = 0; i < N; i++) if (i == k) c = v[i];

(13)

Transformation rules

• Match input code, check conditions, generate output.

• Focused onproceduralproperties / structures.

(Simplified) Transformation Rule: Simplify Loops

one_iteration_active { pattern: {

for (cexpr(i) = cexpr(ini);

cexpr(i) < cexpr(n); cexpr(i)++)

if (cexpr(i) == cexpr(other)) cstmt(one_stat); } generate: { subs(cstmt(one_stat),cexpr(i),cexpr(other)); } } for (i = 0; i < N; i++) if (i == k) c = v[i];c = v[k];

(14)

Rule Chaining

• Rule chaining: result of rule can be input for another rule.

• Generation of OpenCL in example: loop fusion (2 times) + inlining (3 times) + loop fusion (2 times) + iteration step normalization + loop collapse

• Search space.

• Use metrics on non-functional properties to drive search.

• Target-dependent.

(15)

Infrastructure and Tool

• Current choice: Haskell +cpp.

• Previously: Clang; proved to be tedious and error prone.

• Transformations had to be written in C++.

• AST not designed for source-to-source program transformation.

• The tool:

• Reads external rules in a C-like language (STML).

• Tool parametric to the rules; new, tailored rules, can be developed).

• Rules can include conditions which are eitherfunctionalpragmas orprocedural

pragmas.

(16)

Demo available — Contact us

Step by step transformation from C code to OpenCL.

• RGB filter C code−→readyC code.

• ReadyC code−→OpenCL, OpenMP, MPI, MaxJ.

• CPU→GPGPU:≈100×speedup computation time; but data transfer dominates for this case.

Other examples

• Reduction of complexity orderO(n3)−→O(n2) [−→O(1)]for matrix multiplication given properties of one input matrix.

• Transformation ofc=a·v+b·vintoc=k·vwherek =a+b— includes loop fusion, code hoisting; saves run time and FP operations.

• Transformation of original C code intoreadyC code for MaxJ — non-trivial transformations.

(17)

Future Steps

• Apply to more complex code and transformation rules.

• Perform realistic tests on actual architectures of said rules (ongoing).

• Improve description and handling of code blocks to handle more code structures.

• Improve handling of different compilation targets.

• Interface with external tools to reduce annotations.

• Would probably need feedback to / from the user.

(18)

M. Carro1,2, S. Tamarit2, G. Vigueras1, J. Mari ˜no2

(19)
(20)

Translation

• Code generated withadequateshape + annotations.

• E.g., shape for a GPGPU different from shape for OMP

• OMP: split initial image among available threads.

• GPGPU / OpenCL: every loop iteration goes to one task.

• Annotations include, e.g.,

• Explicit independence between iterations or code fragments.

• Information about loop splitting.

(21)

Loop Fusion Transformation Rule (Simplified)

for_loop_fusion_pragma {

pattern: { cstmts(ini); #pragma polca def a

#pragma polca map inputa outputa

for(cexpr(i) = cexpr(init);

cexpr(i) < cexpr(n); cexpr(modi)) { cstmts(bodyFOR1);

}

cstmts(mid); #pragma polca def b

#pragma polca map inputb outputb

for(cexpr(j) = cexpr(init);

cexpr(j) < cexpr(n); cexpr(modj)) { cstmts(bodyFOR2); } cstmts(fin); } condition: { no_reads(cexpr(i),cstmts(mid)); no_reads(cexpr(j),cstmts(fin)); } generate: { cstmts(ini); cstmts(mid);

#pragma polca same_properties a #pragma polca same_properties b

for(cexpr(i) = cexpr(init); cexpr(i) < cexpr(n); cexpr(modi)) { cstmts(bodyFOR1); subs(cstmts(bodyFOR2), cexpr(j), cexpr(i)); } cstmts(fin); } }

(22)

Salvador Tamarit, Guillermo Vigueras, Manuel Carro, and Julio Mari ˜no. A Haskell Implementation of a Rule-Based Program Transformation for C Programs.

In Enrico Pontelli and Tran Cao Son, editors,International Symposium on Practical Aspects of Declarative Languages, number 9131 in LNCS, page TBD. Springer-Verlag, June 2015.

Guillermo Vigueras, Salvador Tamarit, Manuel Carro, and Julio Mari ˜no. Towards a Rule-Based Approach to Generate High-Performance Scientific Code.

Poster presented at the 2015 HiPEAC Conference, Amsterdam, January 2015.

References

Related documents

product in the line. The consistency of the product mix refers to how closely related the various product lines are in end use, production requirements, distribution channels or

This report includes a brief description of the samples presented for test, a list of the documents presented as test instructions, and a summary of the testing

Analysis of the impact of scale on a single satellite pixel (ID 1167): (a) the topography and gauge network as well as superimposed grids at 0.1 8 and 0.25 8 resolution and

The European Commission Communication on e-Skills of 7 September 2007 endorsed by the EU Council of Ministers on 23 November 2007 (on page 19) explicitly refers to: “…Encouraging

The impact of the 18 catalytic reaction is studied with a phenomenological zero-dimensional (0D) engine model, where fuel oxidation and SO x 19 formation is modeled with a

Firm size not only positively affects the probability of foreign market entry, it is also correlated with a higher persistence in firms’ exporting activities, i.e., large firms

adults, and social workers play a key role in helping elders engage in future care planning (FCP). This study examined geriatric social service professionals’ practices and

• National Institute of Food and Agriculture’s Hispanic-Serving Institutions Education Grants Program (HSI) is a competitive grants program intended to promote and strengthen