• No results found

Step : Create Dependency Graph for Data Path Step b: 8-way Addition? So, the data operations are: 8 multiplications one 8-way addition Balanced binary

N/A
N/A
Protected

Academic year: 2021

Share "Step : Create Dependency Graph for Data Path Step b: 8-way Addition? So, the data operations are: 8 multiplications one 8-way addition Balanced binary"

Copied!
8
0
0

Loading.... (view fulltext now)

Full text

(1)

ELEC3017 - RTL 1

RTL Design

RTL Design

Gate-level design is now rare!

– design automation is necessary to manage the complexity of modern circuits

– only library designers use gates

– automated RTL synthesis is now almost universal

RTL = Register Transfer Level

– The design is perceived as a number of registers with transfer functions which transform data as it passes from one register to another

– this is a synchronous methodology

– chosen as the methodology for input to synthesis

ELEC3017 - RTL 2

RTL Overview

RTL Overview

Controller Register Transfer Function Data Path load fn sel

RTL Design Steps

RTL Design Steps

There are typically 8 steps in the RTL design process

1. Create a Dependency Graph for the Data Path 2. Determine the widths of the data paths 3. Decide what resources to provide

4. Allocate operations to resources and schedule them 5. Allocate registers to intermediate results

6. Share registers 7. Design the controller

8. Design the reset/initialisation mechanism

The order of these steps may vary and may be iterated

Example

Example

Scalar-product calculator

Not a realistic design, but shows the elements of RTL design

The example will work on 8 element vectors.

(2)

ELEC3017 - RTL 5

Step 1: Create Dependency Graph for Data

Path

Step 1: Create Dependency Graph for Data

Path

So, the data operations are:

– 8 multiplications – one 8-way addition

ELEC3017 - RTL 6

Step 1b: 8-way Addition?

Step 1b: 8-way Addition?

Balanced binary tree Skewed binary tree

Step 1c: Identify and Make Unique Data

Paths

Step 1c: Identify and Make Unique Data

Paths

Create what is known as the Single Assignment Form

p0 := a0 * b0; p1 := a1 * b1; p2 := a2 * b2; ... z1 := p0 + p1; z2 := z1 + p2; z3 := z2 + p3; ... z := z6 + p7;

• Each variable assigned only once • Each statement uses just one operator • Create new intermediate variables to achieve

this

Step 2: Determine Data Path Widths

Step 2: Determine Data Path Widths

In most cases, this is defined for I/O as part of the specification.

The designer typically then has to decide on internal data path widths.

In this case we’ll assume the specification:

– Input is 8-bit 2’s-complement – Output is 16-bit 2’s-complement

To achieve this output precision, internal paths must be 16-bit 2’s-complement

Precision of Operations:

– all multiplications are 8-bit input, 16-bit output – the addition is 16-bit input, 16-bit output

(3)

ELEC3017 - RTL 9

Step 3: Choose Resources to Provide

Step 3: Choose Resources to Provide

As a first solution, target a minimum-area implementation

– one multiplier, 8-bit inputs, 16-bit output – one adder, 16-bit inputs, 16-bit output

These resources will be shared – remember that the original algorithm requires the following operations:

– eight multiplications – seven additions

ELEC3017 - RTL 10

Step 4: Allocate/Schedule Operations

Step 4: Allocate/Schedule Operations

This stage of the design determines which operations are to be performed by which resources at which time

– Allocation – which resource

– Scheduling – what time (i.e. which clock cycle)

Allocation for this example is trivial

– all additions are allocated to the adder – all multiplications are allocated to the multiplier

Scheduling has many permutations

– it doesn’t matter what order multiplications are performed – it doesn’t matter what order additions are performed – addition is associative and commutative

Step 4b: Allocate/Schedule Operations

Step 4b: Allocate/Schedule Operations

z6 + p7 ҧz z5 + p6 ҧz6 z4 + p5 ҧz5 z3 + p4 ҧz4 z2 + p3 ҧz3 z1 + p2 ҧz2 p0 + p1 ҧz1 -add1 a7 * b7 ҧp7 8 a6 * b6 ҧp6 7 a5 * b5 ҧp5 6 a4 * b4 ҧp4 5 a3 * b3 ҧp3 4 a2 * b2 ҧp2 3 a1 * b1 ҧp1 2 a0 * b0 ҧp0 1 mult1 Cycle

Step 4c: Simplify Schedule

Step 4c: Simplify Schedule

z6 + p7 ҧz z5 + p6 ҧz6 z4 + p5 ҧz5 z3 + p4 ҧz4 z2 + p3 ҧz3 z1 + p2 ҧz2 z0 + p1 ҧz1 0 + p0 ҧz0 add1 a7 * b7 ҧp7 8 a6 * b6 ҧp6 7 a5 * b5 ҧp5 6 a4 * b4 ҧp4 5 a3 * b3 ҧp3 4 a2 * b2 ҧp2 3 a1 * b1 ҧp1 2 a0 * b0 ҧp0 1 mult1 Cycle

(4)

ELEC3017 - RTL 13

Step 5: Allocate registers to intermediate

results

Step 5: Allocate registers to intermediate

results

Every variable in the scheduling that is generated in one cycle and used in another must be registered.

The schedule proposed only requires outputs z0…z to be registered,

not the products:

z0 + p1 ҧz1 0 + p0 ҧz0 2 1 a1 * b1 ҧp1 a0 * b0 ҧp0

However, the schedule does assume that a multiply-accumulate can be done in one clock cycle

ELEC3017 - RTL 14

Step 6: Share Registers

Step 6: Share Registers

A register can be shared between variables if their lifetimes do not intersect. 1 z0 2 z1 3 z2 4 z3 5 z4 6 z5 7 z6 8 z

Step 6b: Simplify Registers

Step 6b: Simplify Registers

z + p7 ҧz z + p6 ҧz z + p5 ҧz z + p4 ҧz z + p3 ҧz z + p2 ҧz z + p1 ҧz 0 + p0 ҧz add1 a7 * b7 ҧp7 8 a6 * b6 ҧp6 7 a5 * b5 ҧp5 6 a4 * b4 ҧp4 5 a3 * b3 ҧp3 4 a2 * b2 ҧp2 3 a1 * b1 ҧp1 2 a0 * b0 ҧp0 1 mult1 Cycle

Design so Far

Design so Far

The data path design is now complete:

z mult1

(5)

ELEC3017 - RTL 17

Step 7: Design the controller

Step 7: Design the controller

The controller is responsible for:

– routing operation inputs to resource inputs at the scheduled cycle – enabling registers to store intermediate results

load add1 select z select b7 select a7 8 load add1 select z select b6 select a6 7 load add1 select z select b5 select a5 6 load add1 select z select b4 select a4 5 load add1 select z select b3 select a3 4 load add1 select z select b2 select a2 3 load add1 select z select b1 select a1 2 load add1 select 0 select b0 select a0 1 register zero mux b mux a mux Cycle ELEC3017 - RTL 18

Step 7b: Design the Controller

Step 7b: Design the Controller

In this case the controller becomes a simple 3-bit counter to control the muxes

The register is always enabled

=0

Step 8: Design Reset Mechanism

Step 8: Design Reset Mechanism

Most systems require a reset mechanism

This may be synchronous or asynchronous – this will be in the specification

Most RTL designs are reset by resetting the controller

The controller then resets the data path (as in this example)

In this example, the reset simply puts the counter in its start state (count = 0) reset clear

VHDL Description

VHDL Description

library ieee; use ieee.std_logic_1164.all; use ieee.numeric_std.all; package cross_product_types is

subtype sig8 is signed (7 downto 0); subtype sig16 is signed (15 downto 0); type sig8_vector is array (natural range <>) of

sig8; end; library ieee; use ieee.std_logic_1164.all; use ieee.numeric_std.all; use work.cross_product_types.all; entity cross_product is

port (a, b : in sig8_vector(7 downto 0); ck, reset: in std_logic; z : out sig16); end;

architecture RTL of cross_product is signal i : unsigned(2 downto 0); signal ai, bi : sig8;

signal product, add_in, sum, accumulator : sig16; begin

control: process (ck) begin

if ck'event and ck = '1' then if reset = '1' then i <= "000"; else i <= i + 1; end if; end if; end process; a_mux: ai <= a(i); b_mux: bi <= b(i);

z_mux: add_in <= X"0000" when i = 0 else accumulator; multiply: product <= ai * bi;

add: sum <= product + add_in; process (ck)

begin

if ck'event and ck = '1' then accumulator <= sum; end if; end process; output: z <= accumulator; end;

(6)

ELEC3017 - RTL 21

More Complex Schedules

More Complex Schedules

e.g. Zwolinski p197 (simplified)

input_sum := input; for j in 0 to order-1 loop

input_sum := input_sum + delay(j) * ceoffb(j); end loop;

output_sum := input_sum * coeffa(order); for k in 0 to order loop

output_sum := output_sum + delay(k) * coeffa(k); end loop;

output := output_sum;

input_sum := input;

input_sum := input_sum + delay(0) * ceoffb(0); output_sum := input_sum * coeffa(1);

output_sum := output_sum + delay(0) * coeffa(0); output_sum := output_sum + delay(1) * coeffa(1); output := output_sum;

for order = 1

ELEC3017 - RTL 22

Single Assignment Form

Single Assignment Form

p0 := delay(0) * ceoffb(0); s0 := input + p0; s1 := s0 * coeffa(1); p1 := delay(0) * coeffa(0); s2 := s1 + p1; p2 := delay(1) * coeffa(1); output := s2 + p2; • 3 adds4 multiplies input_sum := input;

input_sum := input_sum + delay(0) * ceoffb(0); output_sum := input_sum * coeffa(1);

output_sum := output_sum + delay(0) * coeffa(0); output_sum := output_sum + delay(1) * coeffa(1); output := output_sum;

Data Dependency Graph

Data Dependency Graph

delay(0) coeffb(0) s0 delay(1) coeffa(0) coeffa(1) s1 s2 p0 p1 p2 input * + * * * + +

ASAP Schedule

ASAP Schedule

delay(0) coeffb(0) coeffa(1) coeffa(0) delay(1)

output input * + * * * + + Cycle 1 2 3 4 5 • 1 adder • 3 multipliers

(7)

ELEC3017 - RTL 25

ALAP Schedule

ALAP Schedule

delay(0) coeffb(0) coeffa(1) coeffa(0) delay(1)

output input

*

+

*

*

*

+

+

Cycle 1 2 3 4 5 • 1 adder • 2 multipliers ELEC3017 - RTL 26

Resource Constrained Schedule

Resource Constrained Schedule

delay(0) coeffb(0) coeffa(1) coeffa(0) delay(1)

output input * + * * * + + Cycle 1 2 3 4 5 • 1 adder • 1 multiplier

Allocation

Allocation

• 1 adder • 2 multipliers

delay(0) coeffb(0) coeffa(1) coeffa(0) delay(1)

output input * + * * * + + Cycle 1 2 3 4 5 mpy2 mpy1 add1

Allocate Registers

Allocate Registers

delay(0) coeffb(0) input coeffa(1) coeffa(0) delay(1)

*

+

Cycle 1 2 3 4 5

*

*

*

+

+

R1 R3 R6 R9 R2 R4 R7 R5 R8 R10

(8)

ELEC3017 - RTL 29

Share Registers

Share Registers

delay(0) coeffb(0) coeffa(1) coeffa(0) delay(1)

output input

*

+

Cycle 1 2 3 4 5

*

*

*

+

+

R1 R3 R6 R9 R11 R2 R4 R7 R5 R8 R10 • 5 Registers ELEC3017 - RTL 30

Data Path

Data Path

R2/R5 R4/R8 R7/R10

delay(0) coeffa(0) coeffa(1) delay(1)

mux3 mux4

mpy2

R1/R6

delay(0) R3 coeffb(0) coeffa(1)

mux1 mux2 mpy1 R3/R9/ R11 R1 R6 add1 R9 input R7 R10 mux5 mux6

Controller State Machine

Controller State Machine

Cycle 1 mux1 = 0 mux2 = 0 mux3 = 0 mux4 = 0 enable R1 enable R2 Cycle 2 mux5 = 0 mux6 = 0 mux3 = 1 mux4 = 1 enable R3 enable R4 enable R5 Cycle 3 mux1 = 1 mux2 = 1 enable R6 enable R7 enable R8 Cycle 4 mux5 = 1 mux6 = 1 enable R9 enable R10 Cycle 5 mux5 = 2 mux6 = 2 enable R11

References

Related documents

There are several important areas where this study makes an original contribution to the topic of subtitling linguistic varieties in general and in

Thus, the protocol for model building presented here can be applied to derive the normal metabolite concentrations in newborn urine samples for the general population in the

Care Health Plan Medicare Advantage network providers in your area, you can visit www.lacare.org or call our Member Services Department at 1-888-522-1298, 24 hours a day, 7 days

Several empirical trends are highlighted, with a focus on: rising inequality in China and persistent inequality in Europe; the traditional family sequence in China

Key words: Sustainable Development Goals, Human Rights, Domestic Violence, Violence Against Women, Gender, Empowerment, Feminist Political Economy, Development

On the other hand, in Company Z, in addition to collaboration between the finance and marketing functions, the decisions of the company must be synchronised with new

Since the main programming abstraction in Spark is RDDs, offering RDD primitive facilitates end users to gain micro-level understanding of how RDD performs within

This study examined relationship of six factors with adoption of innovation of HRIS in organizations: innovation factors of Relative Advantage, Compatibility,