• No results found

Dataflow Programming with MaxCompiler

N/A
N/A
Protected

Academic year: 2021

Share "Dataflow Programming with MaxCompiler"

Copied!
30
0
0

Loading.... (view fulltext now)

Full text

(1)

Dataflow Programming

with MaxCompiler

(2)

Programming DFEs

MaxCompiler

Streaming Kernels

Compile and build

Java meta-programming

(3)

Reconfigurable Computing with DFEs

DSP Block

IO Block

Logic Cell (10

5

elements)

Xilinx Virtex-6 FPGA

(4)

DFE Acceleration Hardware Solutions

MaxRack 10U, 20U or 40U

MaxCard

1U server

4 MAX3 Cards

Intel Xeon CPUs

MaxNode

MaxRack

PCI-Express Gen 2

Typical 50W-80W

24-48GB RAM

10U, 20U or

40U Rack

(5)

Schematic entry of circuits

Traditional Hardware Description Languages

VHDL, Verilog, SystemC.org

Object-oriented languages

C/C++, Python, Java, and related languages

Functional languages: e.g. Haskell

High level interface: e.g. Mathematica, MatLab

Schematic block diagram e.g. Simulink

Domain specific languages (DSLs)

(6)

Accelerator Programming Models

DSL

DSL

DSL

DSL

Possible applications

Lev

el

o

f

A

b

str

ac

ti

o

n

Flexible Compiler System: MaxCompiler

Higher Level Libraries

Higher

Level

Libraries

(7)

Acceleration Development Flow

St ar t Original Application Identify code for acceleration and analyze bottlenecks Write MaxCompiler code Simulate Functions correctly? Build for Hardware

Integrate with Host code Meets performance goals? Accelerated Application

NO

YES

YES

NO

Transform app, architect and model performance

(8)

Complete development environment for Maxeler DFE

accelerator platforms

Write

MaxJ

code to describe the dataflow accelerator

MaxJ is an extension of Java for MaxCompiler

Execute the Java

generate the accelerator

C software on CPUs

uses

the accelerator

MaxCompiler

class MyAccelerator extends Kernel { public MyAccelerator(…) {

DFEVar x = io.input("x", dfeFloat(8, 24)); DFEVar y = io.input(“y", dfeFloat(8, 24));

DFEVar x2 = x * x; DFEVar y2 = y * y;

DFEVar result = x2 + y2 + 30;

io.output(“z", result, dfeFloat(8, 24)); }

(9)

Application Components

SLiC

MaxelerOS

Memory

CPU

DFE

Me

mo

ry

PCI Express

Kernels

* + +

Manager

Host application

(10)

Programming with MaxCompiler

Host Application *.c, *.f90 ...

User Input

Compiler Linker Executable

Output

MAX File Sim or H/W (.max) MaxCompiler

Output

MyKernel.maxj MyManager.maxj

User Input

MaxIDE

Rewrite only code to be accelerated

maxelerOS Library SLiC

(11)

for (int i =0; i < DATA_SIZE; i++) y[i]= x[i] * x[i] + 30;

Main

Memory

CPU

Host

Code

Host Code (.c)

Simple Application Example

int*x, *y;

30

i i i

x

x

y

(12)

PCI Express

Manager

DFE

Memory

MyManager (.maxj)

x x + 30 x

Manager m = new Manager(); Kernel k = new MyKernel(); m.setKernel(k); m.setIO( link(“x", CPU), m.build(); link(“y", CPU)); MyKernel(DATA_SIZE, x, DATA_SIZE*4, y, DATA_SIZE*4); for (int i =0; i < DATA_SIZE; i++) y[i]= x[i] * x[i] + 30;

Main

Memory

CPU

Host

Code

Host Code (.c)

Development Process

MaxCompilerRT

MaxelerOS

DFEVar x = io.input("x", dfeInt(32)); DFEVar result = x * x + 30;

io.output("y", result, dfeInt(32));

MyKernel (.maxj)

int*x, *y;

y

x

x

+

30

y

x

(13)

PCI Express

Manager

DFE

Memory

MyManager (.maxj)

Manager m = new Manager(); Kernel k = new MyKernel(); m.setKernel(k); m.setIO( link(“x", CPU), device = max_open_device(maxfile, "/dev/maxeler0"); MyKernel( DATA_SIZE, x, DATA_SIZE*4);

Main

Memory

CPU

Host

Code

Host Code (.c)

Development Process

MaxCompilerRT

MaxelerOS

DFEVar x = io.input("x", dfeInt(32)); DFEVar result = x * x + 30;

io.output("y", result, dfeInt(32));

MyKernel (.maxj)

int*x, *y;

x

y

link(“y", DRAM_LINEAR1D)); x x + 30 x

x

x

+

30

y

(14)

x

x

+

30

y

public class MyKernel extends Kernel {

public MyKernel (KernelParameters parameters) {

super(parameters);

DFEVar x = io.input("x", dfeInt(32)); DFEVar result = x * x + 30;

io.output("y", result, dfeInt(32)); }

}

(15)

x

x

+

30

y

Streaming Data through the Kernel

5 4 3 2 1 0

30

0

0

(16)

x

x

+

30

y

Streaming Data through the Kernel

5 4 3 2 1 0

30 31

31

1

1

(17)

x

x

+

30

y

Streaming Data through the Kernel

5 4 3 2 1 0

34

4

2

(18)

x

x

+

30

y

Streaming Data through the Kernel

5 4 3 2 1 0

30 31 34 39

39

9

3

(19)

x

x

+

30

y

Streaming Data through the Kernel

5 4 3 2 1 0

46

16

4

(20)

x

x

+

30

y

Streaming Data through the Kernel

5 4 3 2 1 0

30 31 34 39

46

55

55

25

5

(21)

Java program

generates

a MaxFile

when it runs

1. Compile the Java into .class files

2. Execute the .class file

Builds the dataflow graph in memory

Generates the hardware .max file

3. Link the generated .max file with your host program

4. Run the host program

Host code automatically configures DFE(s) and interacts

with them at run-time

(22)

You can use the full power of Java to write a program

that

generates

the dataflow graph

Java variables can be used as constants in hardware

int y; DFEVar x; x = x + y;

Hardware variables can not be read in Java!

Cannot do: int y; DFEVar x; y = x;

Java conditionals and loops choose

how

to generate

hardware

not make run-time decisions

(23)

Dataflow Graph Generation: Simple

What dataflow graph is generated?

DFEVar x = io.input(“x”, type);

DFEVar y;

y = x + 1;

io.output(“y”, y, type);

x

+

1

y

(24)

Dataflow Graph Generation: Simple

What dataflow graph is generated?

DFEVar x = io.input(“x”, type);

DFEVar y;

y = x + x + x;

io.output(“y”, y, type);

x

+

y

+

(25)

Dataflow Graph Generation: Variables

What’s the value of h if we stream in 1?

DFEVar h = io.input(“h”, type);

int s = 2;

s = s + 5

h = h + 10

h = h + s;

+

+

10

1

7

18

What’s the value of s if we stream in 1?

DFEVar h = io.input(“h”, type);

int s = 2;

s = s + 5

h = h + 10

(26)

Dataflow Graph Generation: Conditionals

What dataflow graph is generated?

DFEVar x = io.input(“x”, type);

int s = 10;

DFEVar y;

if (s < 100) { y = x + 1; }

else { y = x – 1; }

io.output(“y”, y, type);

What dataflow graph is generated?

DFEVar x = io.input(“x”, type);

DFEVar y;

if (x < 10) { y = x + 1; }

else { y = x – 1; }

io.output(“y”, y, type);

x

+

1

y

Compile error.

You can’t use the value of ‘x’ in a

Java conditional

(27)

Compute both values and use a

multiplexer.

x = control.mux(select, option0, option1, …, optionN)

x = select ? option1 : option0

Conditional Choice in Kernels

DFEVar x = io.input(“x”, type);

DFEVar y;

y = (x > 10) ? x + 1 : x – 1

io.output(“y”, y, type);

Ternary-if operator is

overloaded

x

+

1

-

1

>

10

(28)

Dataflow Graph Generation: Java Loops

What dataflow graph is generated?

DFEVar x = io.input(“x”, type);

DFEVar y = x;

for (int i = 1; i <= 3; i++) {

y = y + i;

}

io.output(“y”, y, type);

x

+

1

y

+

2

+

3

Can make the loop any size – until you run out

of space on the chip!

Larger loops can be partially unrolled in space

and used multiple times in time – see Lecture on

loops and cyclic graphs

(29)

Real data flow graph as

generated by MaxCompiler

4866 nodes;

(30)

1.

Write a MaxCompiler kernel program that takes three input streams

x,

y and

z

which are

dfeInt(32) and computes an output stream p, where:

2.

Draw the dataflow graph generated by the following program:

Exercises

for (int i = 0; i < 6; i++) {

DFEVar x = io.input(“x”+i, dfeInt(32)); DFEVar y = x; if (i % 3 != 0) { for (int j = 0; j < 3; j++) { y = y + x*j; } } else { y = y * y; } io.output(“y”+i, y, dfeInt(32));

otherwise

z

if

z

if

)

(

2

)

(

2

)

(

i i i i i i i i i i i i

x

x

z

y

x

z

x

y

x

p



References

Related documents

This study and investigate focused on the number of rules in SIFLC, lookup table, slope of a linear equation, and also model reference to give optimum performances of depth

reads k&lt;=length characters from the stream, puts those characters starting from b[offset], and returns the total number of characters read. If the end of stream is reached,

[r]

The lower court was correct in applying Article 559 of the Civil Code to the case at bar, for under it, the rule is to the effect that if the owner has lost a thing, or if he has

resistances in downtrends. We will not go into the details of why and how of this belief. Instead, we will look at the how volume and spread can give us clues whether the trend

(Trimax) is a leading end-to-end service provider of IT Services and Solutions encompassing Integration, Data Centre, Management of IT Infrastructure, and Networking.  Firm has

Generalize to Higher Generalize to Higher Dimensions Dimensions X Y Z P’ (x/z, y/z, 1) P (x, y, z) 2D points represented by homogeneous coordinates?. Similarly, 3D points are

Be it resolved that the Netcong Board of Education, upon the recommendation of the Superintendent, hereby approves the following resolution of the Board of Education of the