• No results found

High-Level Synthesis

N/A
N/A
Protected

Academic year: 2021

Share "High-Level Synthesis"

Copied!
23
0
0

Loading.... (view fulltext now)

Full text

(1)

Stacked FSMD: A Power Efficient Micro-Architecture for High-Level Synthesis

Khushwinder Jasrotia, Jianwen Zhu

Electrical and Computer Engineering University of Toronto

March 24th, 2004

[email protected]

http://www.eecg.toronto.edu/˜jzhu

(2)

Outline

Motivation

New Approach

Experimental Results Conclusion

ISQED Copyright c Khushwinder Jasrotia, March 24, 2004, ECE, Univ. of Toronto 2

(3)

High-Level Synthesis

Register-transfer level (RTL)

Current industrial design standard Difficult for complex design

Behavioral level

The detailed design is abstracted away timing

resource sharing Natural trend!!

High-level synthesis

Automated refining process from behavioral level to RTL level

(4)

Then, why not High-Level Synthesis?

High-level synthesis still remains in academia and a few EDA companies

Today’s high-level synthesis tools have limited power For example, modular design is limited by

Expressive power of the languages Current HLS tools: VHDL, Verilog

Software languages are more ideal: C/C++, Java But efficiently handle procedures becomes problem Traditional micro-architecture model

Monolithic

ISQED Copyright c Khushwinder Jasrotia, March 24, 2004, ECE, Univ. of Toronto 4

(5)

Procedure Abstraction

Ability to efficiently handle procedures

Finite-state machine with datapath (FSMD)

Classical model for HLS Monolithic

Inlining

Flattened design increases control logic complexity

Fails to take advantage of the mutual exclusive nature of

procedures for resource sharing

Datapath Unit

Status

Datapath Outputs Datapath Inputs

Datapath Control Next−State

Logic

Output Logic State Register

Control Unit

Control Outputs Control Inputs

(6)

Previous Works

Camposano and Eijndhoven[ICCD97] implemented procedures as independent hardware modules that used handshaking signals for communication.

Gajski Et al.[EDAC93] described a method in which each procedure occupied a portion of the main controller state-table.

Vahid[ISFPGA97] described a method in which procedures were implemented as separate modules and a common bus was used to transfer address and parameter information between them.

ISQED Copyright c Khushwinder Jasrotia, March 24, 2004, ECE, Univ. of Toronto 6

(7)

Our Proposed Solution

Stacked FSMD model

Modified model from FSMD

Power-efficient for procedure abstraction

Region based partitioning

Allows redefining procedure boundary

Behavioral power index

Assists partitioning decision

(8)

The SFSMD Model

Each procedure is implemented as a separate controller but a common datapath is shared

Controlled by stack controller Benefits

Saving power - only one controller is activated at a time

Increasing resource sharing - common datapath

CONTROLLER 1 Address

Call

Return

Enable Datapath Control

Status

Address

Call

Return

Enable Datapath Control

Status CONTROLLER 2

Address

Call

Return

Enable Datapath Control

Status CONTROLLER n ADDRESS BUS CALL SIGNAL RETURN SIGNAL

Status Control

Address Call Return

STACK CONTROLLER

TRI−STATE BUFFERS

Ctrl #n Enable

Cntll #2 Enable Cntrl #1 Enable

DATAPATH

ISQED Copyright c Khushwinder Jasrotia, March 24, 2004, ECE, Univ. of Toronto 8

(9)

More Power Saving with SFSMD

Power saving comes from activating one controller at a time Breaking the procedure boundary

Allows redefining the procedure boundary Exlining

Replaces a sequence of statements by procedure call Inverse of inlining

Loop-based exlining

reducing switching activity - localized activities

reducing power - small controller is activated at a time

(10)

Region Based Partitioning

A region based partitioning scheme is introduced for the

reduction of power consumption.

Operates on the basis of extracting loops and implementing them as separate controllers in the SFSMD model.

Original Specification

Loop1

Loop2

Loop3

Loop4

Call Loop1

Call Loop2

Call Loop3

Call Loop4

Loop3 Loop1

Loop2 Loop4 After Loop Exlining

Main Controller

Stack Controller Loop1 Controller

Loop2 Controller

Loop3 Controller

Loop4 Controller

Shared Datapath

SFSMD Implementation

ISQED Copyright c Khushwinder Jasrotia, March 24, 2004, ECE, Univ. of Toronto 10

(11)

Region Based Partitioning

The partitioning redefines the controller boundaries for the original specification by a series of exlining and inlining operations.

Call Foo

Loop 1

Procedure Foo

Loop 2 Main

Process

Loop 1 Loop 2

Main Process

Loop2

Loop1 Call

Loop2

Call Loop1 Main Process

(a) (b) (c)

(12)

Different Ways of Partitioning Code

Power tradeoff of exlining

+ Small logic circuit is activated at a time

- Increases inter-procedural communication

- Loss of control-step optimization opportunity

Main

Loop1 Loop2

Loop3 Loop4

Original Specification Tree

Main

Loop1 Loop2

Loop3 Loop4

Partition 1 Tree

Main

Loop1 Loop2

Loop3 Loop4

Partition 4 Tree

Main

Loop1 Loop2

Loop3 Loop4

Partition 2 Tree

Main

Loop1 Loop2

Loop3 Loop4

Partition 3 Tree

ISQED Copyright c Khushwinder Jasrotia, March 24, 2004, ECE, Univ. of Toronto 12

(13)

Partitioning-Index

For each partition Pj, a power index is defined as:

Xk i=1

|States(Ri)| · Cycles(Ri) + K · Calls(Ri) (1)

where Ri ∈ Pj

|States(Ri)| : This is the number of control-steps used in the FSM of the controller for region i.

Cycles(Ri) : This is the number of control-cycles spent in region i.

Calls(R ) : This represents the number of calls made to region i.

(14)

Experimental Procedure

Experiments were performed to verify the following:

Verify that the SFSMD model actually works.

Verify that region based partitioning saves power and to get an indication of how much power is saved.

Confirm that the partition power-index values correlate well with the actual power of the partitions.

ISQED Copyright c Khushwinder Jasrotia, March 24, 2004, ECE, Univ. of Toronto 14

(15)

C Benchmark Kernels

The region based partitioning was applied to C Livermore kernels.

Power measurements were made by summing the individual energy contributions of the regions.

Only controller power was reported.

The Design Compiler tool from Synopsys was used to synthesize the designs, and the Power Compiler tool from Synopsys was used to report the power.

(16)

C Benchmark Kernels Cont’

The kernels were manually partitioned into different regions and compiled into VHDL.

The partitions considered were horizontal “cuts” across the depth of the program tree.

The controller portions of the regions were synthesized and their power was compared with the unpartitioned design.

Main

Loop1 Loop2

Loop3

Main

Loop1 Loop2

Loop3

Main

Loop1 Loop2

Loop3

Partition 0 Partition 1 Partition 2

ISQED Copyright c Khushwinder Jasrotia, March 24, 2004, ECE, Univ. of Toronto 16

(17)

Power Reduction Results

The reduction of power consumption ranged between 12 % to 67 % over the unpartitioned design, with an average area overhead of 5.5

%.

Power Reduction Partition Level 1

13 12 18

14 16 12

34

20 18

31 31 33 33 43

49 57

13 14 17

24 45

20 29

0 10 20 30 40 50 60 70

LL1_int LL2_int

LL3_int LL4_int

LL5_int LL6_int

LL7_int LL8_int

LL9_int LL10_int

LL11_int LL12_int

LL13_int LL14_int

LL15_int LL16_int

LL18_int LL19_int

LL20_int LL21_int

LL22_int LL23_int

LL24_int

% Reduction

(18)

Power Reduction Results Cont’

Power Reduction Partition Levels 1 & 2

12 14 12

20 43

49

13 14

17 24

45

20 29

21 20

28 26

32 53

67

24 25 28

50

24 40

0 10 20 30 40 50 60 70 80

LL2_int LL4_int

LL6_int LL8_int

LL14_int LL15_int

LL18_int LL19_int

LL20_int LL21_int

LL22_int LL23_int

LL24_int

Benchmarks

% Reduction

Power Reduction - Partition 1 Power Reduction - Partition 2

ISQED Copyright c Khushwinder Jasrotia, March 24, 2004, ECE, Univ. of Toronto 18

(19)

Power Reduction Results Cont’

Power Reduction Partition Levels 1, 2 & 3

12 12

49

24 21 20

28

53

28

24 52

37

53

36

31

0 10 20 30 40 50 60

LL2_int LL6_int LL15_int LL21_int LL23_int Benchmarks

% Reduction

Power Reduction - Partition 1 Power Reduction - Partition 2 Power Reduction - Partition 3

(20)

Partition Power-Index Results

Partition power-index values correlated well with actual power

Power Partition Level 1

1.3 3.1 1.4 2.4 1.4 2.2 3.0 3.5 4.9 1.4 1.4 4.0 2.6 4.5 5.7 7.5 2.2 3.7 2.6 1.9 4.3 2.1

1.1 2.7 1.2 2.0 1.2 1.9 2.0 6.9 2.8 3.4 1.0 1.0 2.7 1.5 2.3 2.4 6.6 1.9 3.1 2.0 1.1 3.4 1.5

8.6

0 1 2 3 4 5 6 7 8 9 10

LL1_int LL2_int

LL3_int LL4_int

LL5_int LL6_int

LL7_int LL8_int

LL9_int LL10_int

LL11_int LL12_int

LL13_int LL14_int

LL15_int LL16_int

LL18_int LL19_int

LL20_int LL21_int

LL22_int LL23_int

LL24_int

Benchmarks

Power (mWatt)

Unpartitioned Power Partition Level 1 Power

Power Index Partition Level 1

0 100000 200000 300000 400000 500000 600000 700000 800000

LL1_int LL2_int

LL3_int LL4_int

LL5_int LL6_int

LL7_int LL8_int

LL9_int LL10_int

LL11_int LL12_int

LL13_int LL14_int

LL15_int LL16_int

LL18_int LL19_int

LL20_int LL21_int

LL22_int LL23_int

LL24_int

Benchmark

Power Index

Unpartioned Power Index Partition 1 Power Index

ISQED Copyright c Khushwinder Jasrotia, March 24, 2004, ECE, Univ. of Toronto 20

(21)

Partition Power-Index Results Cont’

Power

Partition Levels 1 & 2

3.1 2.4 2.2 8.6 2.6 4.5 7.5 2.2 3.7 2.6 1.9 4.3 2.12.7 2.0 1.9 6.9 1.5 2.3 6.6 1.9 3.1 2.0 1.1 3.4 1.5

2.4 1.9 1.6 6.4 1.7 2.1 2.5 1.6 2.8 1.9 1.0 3.3 1.3

0 1 2 3 4 5 6 7 8 9 10

LL2_int LL4_int

LL6_int LL8_int

LL14_int LL15_int

LL18_int LL19_int

LL20_int LL21_int

LL22_int LL23_int

LL24_int

Benchmarks

Power (mWatt)

Unpartitioned Power Partition Level 1 Power Partition Level 2 Power

Power Index Partition Levels 1 & 2

300000 400000 500000 600000 700000 800000

Power Index

(22)

Partition Power-Index Results Cont’

Power for Partition Levels 1, 2 & 3

3.07 2.18 4.47 2.60 4.33

2.70 1.91 2.30 1.96 3.45

2.41 1.58 2.09 1.87 3.28

1.47 1.36 2.10 1.66 3.01

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

LL2_int LL6_int LL15_int LL21_int LL23_int Benchmarks

Power (mWatt)

Unpartitioned Power Partition 1 Power Partition 2 Power Partition 3 Power

Power Index Partitions 1, 2 & 3

0 50000 100000 150000 200000 250000 300000 350000

LL2_int LL6_int LL15_int LL21_int LL23_int Benchmarks

Power Index

Unpartitioned Power Index Partition 1 Power Index Partition 2 Power Index Partition 3 Power Index

ISQED Copyright c Khushwinder Jasrotia, March 24, 2004, ECE, Univ. of Toronto 22

(23)

Conclusion

The SFSMD model provides a good basis for procedure abstraction.

By extracting loops with high execution count, the region-based partitioning technique can help to reduce controller power. Our

experimental result demonstrates power reduction ranging from 12 % to 67 % over the unpartitioned design.

Due to the strong correlation with the actual measured power, the power-index effectively guides the partitioning decisions of a

high-level partitioning tool.

References

Related documents