• No results found

ECE 5745 Complex Digital ASIC Design Course Overview

N/A
N/A
Protected

Academic year: 2021

Share "ECE 5745 Complex Digital ASIC Design Course Overview"

Copied!
47
0
0

Loading.... (view fulltext now)

Full text

(1)

ECE 5745 Complex Digital ASIC Design

Course Overview

Christopher Batten

School of Electrical and Computer Engineering

Cornell University

(2)

• Complex Digital ASIC Design

Activity 1

Case Study: Scalar vs. Vector Processors

Activity 2

RTL

Devices

ISA

PL

Algorithm

Arch

Technology

Application

OS

Gates

Circuits

Complex Digital ASIC Design

I

Course goal, structure, motivation

.

What is the goal of the course?

.

How is the course structured?

.

Why should students want to take this course?

I

Activity 1: Evaluation of Integer Multiplier

I

Case Study: Scalar vs. Vector Processors

.

Example design-space exploration

.

Example real ASIC chips

I

Activity 2: Brainstorming for Sorting Accelerator

I

Course Logistics

(3)

• Complex Digital ASIC Design

Activity 1

Case Study: Scalar vs. Vector Processors

Activity 2

The Computer Engineering Stack

Technology

Application

Gap too large to bridge in one step

(but there are exceptions e.g., magnetic compass)

In its broadest definition, computer architecture is the

design of the abstraction/implementation layers

that allow us to

execute information processing

applications

efficiently

using available manufacturing

technologies

(4)

• Complex Digital ASIC Design

Activity 1

Case Study: Scalar vs. Vector Processors

Activity 2

The Computer Engineering Stack

In its broadest definition, computer architecture is the

design of the abstraction/implementation layers that allow us to

execute information processing applications efficiently

using available manufacturing technologies

Circuits

Devices

Programming Language

Algorithm

C

o

m

p

u

te

r

A

rchit

e

ct

u

re

Technology

Application

Register-Transfer Level

Instruction Set Architecture

Microarchitecture

Operating System

Gate Level

(5)

• Complex Digital ASIC Design

Activity 1

Case Study: Scalar vs. Vector Processors

Activity 2

What is Computer Architecture?

In its broadest definition, computer architecture is the

design of the abstraction/implementation layers

that allow us to

execute information processing

applications

efficiently

using available manufacturing

technologies

Circuits

Devices

Programming Language

Algorithm

C

o

m

p

u

te

r

A

rchit

e

ct

u

re

Technology

Application

Register-Transfer Level

Instruction Set Architecture

Microarchitecture

Operating System

Gate Level

Operating System

Gate Level

Register-Transfer Level

Instruction Set Architecture

Microarchitecture

Application Requirements

• Suggest how to improve architecture

• Provide revenue to fund development

Technology Constraints

• Restrict what can be done efficiently

• New technologies make new arch possible

Architecture provides feedback to guide

application and technology research directions

(6)

• Complex Digital ASIC Design

Activity 1

Case Study: Scalar vs. Vector Processors

Activity 2

Key Metrics in Computer Architecture

P

I$

D$

Network

Network

Network

P

I$

P

I$

P

I$

D$

D$

D$

I

Primary Metrics

.

Execution time (cycles/task)

.

Energy (Joules/task)

.

Cycle time (ns/cycle)

.

Area (µm

2

)

I

Secondary Metrics

.

Performance (ns/task)

.

Average power (Watts)

.

Peak power (Watts)

.

Cost ($)

.

Design complexity

.

Reliability

.

Flexibility

Discuss qualitative first-order analysis

from ECE 4750 on board

(7)

• Complex Digital ASIC Design

Activity 1

Case Study: Scalar vs. Vector Processors

Activity 2

Unanswered Questions from ECE 4750

P

I$

D$

Network

Network

Network

Xcel

P

I$

D$

D$

D$

Xcel

Accelerated

Instructions

I

How can we quantitatively evaluate

area, cycle time, and energy?

I

How do we actually implement

processors, memories, and

networks in a real chip?

I

How should we analyze

application-specific accelerators?

.

Very loosely coupled

memory-mapped accelerators

.

More tightly coupled co-processor

accelerators

.

Specialized instructions and

functional units

(8)

• Complex Digital ASIC Design

Activity 1

Case Study: Scalar vs. Vector Processors

Activity 2

Application-Specific Accelerators

P

I$

D$

Network

Network

Network

Xcel

P

I$

D$

D$

D$

Xcel

Accelerated

Instructions

Performance (Tasks per Second)

Simple

Proc

Processor Power

Constraint

E

ne

rgy

(Jou

le

s

p

e

r

T

as

k)

C

B

A

A

A

A

Superscalar

w/ Deeper

Pipelines

Out-of-Order

Superscalar

Superpipelined

D

E

E

Multicore

F

F

Multicore E

A

A

A

A

Specialized Accelerators

(9)

• Complex Digital ASIC Design

Activity 1

Case Study: Scalar vs. Vector Processors

Activity 2

Application-Specific Accelerators

P

I$

D$

Network

Network

Network

Xcel

P

I$

D$

D$

D$

Xcel

Accelerated

Instructions

Performance (Tasks per Second)

E

ne

rgy

E

ff

ici

e

ncy

(T

as

ks

p

e

r

J

o

u

le

)

Simple

Processor

Design Power

Constraint

High-Performance

Architectures

Embedded

Architectures

Design

Performance

Constraint

Fl

exibil

ity

vs

. S

peci

al

izat

io

n

Custom

ASIC

Less Flexible

Accelerator

More Flexible

Accelerator

(10)

• Complex Digital ASIC Design

Activity 1

Case Study: Scalar vs. Vector Processors

Activity 2

Goal for ECE 5745 is to answer these questions!

P

I$

D$

Network

Network

Network

Xcel

P

I$

D$

D$

D$

Xcel

Accelerated

Instructions

I

How can we quantitatively evaluate

area, cycle time, and energy?

I

How do we actually implement

processors, memories, and

networks in a real chip?

I

How should we analyze

application-specific accelerators?

.

Very loosely coupled

memory-mapped accelerators

.

More tightly coupled co-processor

accelerators

.

Specialized instructions and

functional units

(11)

• Complex Digital ASIC Design

Activity 1

Case Study: Scalar vs. Vector Processors

Activity 2

Full Custom Design vs. Standard-Cell Design

L01 –

Intro

duction 22

6.8

84

Sp

rin

g

2

00

5

2 Feb 2005

Full Custom Design

Designer

is

free

to

do

anything,

anywhere

though

each

design

team

usually

imposes

some

discipline

Most

time

consuming

design

style

Reserved

for

very

high

performance

or

very

high

volume

devices

(Intel

microprocessors,

RF

power

amps

for

cellphones)

Requires

complete

customization

of

all

layers

of

wafer

Piece

of

full-custom

multiplier

array,

1.0

m

2-metal

Full-custom layout

in 1.0µm w/ 2 metal

layers

I

Full-Custom Design

.

Designer is free to do anything, anywhere; though

team usually imposes some design discipline

.

Most time consuming design style; reserved for

very high performance or very high volume chips

(Intel microprocessors, RF power amps for

cellphones)

I

Standard-Cell Design

.

Fixed library of “standard cells” and SRAM

memory generators

.

Register-transfer-level description is automatically

mapped to this library of standard cells, then these

cells are placed and routed automatically

(12)

• Complex Digital ASIC Design

Activity 1

Case Study: Scalar vs. Vector Processors

Activity 2

Standard-Cell Design Methodology

L01 – Introduction 24 6.884 – Spring 2005 2 Feb 2005

Standard Cell ASICs

Also called Cell-Based ICs (CBICs)

Fixed library of cells plus memory generators

Cells can be synthesized from HDL, or entered in schematics

Cells placed and routed automatically

Requires complete set of custom masks for each design

Currently most popular hard-wired ASIC type (6.884 will use this)

Mem

1

Mem

2

Cells arranged

in rows

Generated

memory arrays

L01 – Introduction 25 6.884 – Spring 2005 2 Feb 2005

Standard Cell Design

Cells have standard height but vary in width

Designed to connect power, ground, and wells by abutment

V

DD

Rail

GND Rail

Clock Rail

Cell I/O

on M2

Power

Rails in

M1

Clock Rail

(not typical)

NAND2

Flip-flop

Well Contact

under Power Rail

Ripple carry adder with carry

chain highlighted

(13)

• Complex Digital ASIC Design

Activity 1

Case Study: Scalar vs. Vector Processors

Activity 2

Standard-Cell Design Methodology

Verilog RTL

Gate-Level Model

Verilog

Simulator

Verilog

Simulator

Switching Activity

Power

Analysis

Layout

Synthesis

Place&Route

Educational Synopsys

90nm Process

and Standard Cells

Area

&

Cycle Time

Execution

Time

Energy

(14)

• Complex Digital ASIC Design

Activity 1

Case Study: Scalar vs. Vector Processors

Activity 2

Example Standard-Cell Chip Plot

Motivation

Architectural Patterns

Scale VT Core

Maven VT Core

Evaluation

Single-Lane Vector-Thread Unit w/ 256 Registers

MIT CSAIL

Christopher Batten

32 / 42

(15)

• Complex Digital ASIC Design

Activity 1

Case Study: Scalar vs. Vector Processors

Activity 2

What is Complex Digital ASIC Design?

Circuits

Devices

Programming Language

Algorithm

C

o

m

p

u

te

r

A

rchit

e

ct

u

re

Technology

Application

Register-Transfer Level

Instruction Set Architecture

Microarchitecture

Operating System

Gate Level

Circuits

Register-Transfer Level

Microarchitecture

Gate Level

layout ready for fabrication

Complex digital ASIC design is

the process of

quantitatively exploring the

area, cycle time, execution time, and

energy trade-offs

of various

automated standard-cell CAD tools

and then to transform the most

promising design to

using

general-purpose and

application-specific designs

(16)

• Complex Digital ASIC Design

Activity 1

Case Study: Scalar vs. Vector Processors

Activity 2

RTL

Devices

ISA

PL

Algorithm

Arch

Technology

Application

OS

Gates

Circuits

Complex Digital ASIC Design

I

Course goal, structure, motivation

.

What is the goal of the course?

.

How is the course structured?

.

Why should students want to take this course?

I

Activity 1: Evaluation of Integer Multiplier

I

Case Study: Scalar vs. Vector Processors

.

Example design-space exploration

.

Example real ASIC chips

I

Activity 2: Brainstorming for Sorting Accelerator

I

Course Logistics

(17)

• Complex Digital ASIC Design

Activity 1

Case Study: Scalar vs. Vector Processors

Activity 2

Course Structure

Part 1

ASIC Design

Overview

Part 2

Digital CMOS

Circuits

Part 3

CAD Algorithms

P

P

M

M

Prereq

Computer

Architecture

(18)

• Complex Digital ASIC Design

Activity 1

Case Study: Scalar vs. Vector Processors

Activity 2

Part 1: ASIC Design Overview

P P

M

M

Topic 1

Hardware

Description

Languages

Topic 2

CMOS Devices

Topic 3

CMOS Circuits

Topic 4

Full-Custom

Design

Methodology

Topic 5

Automated

Design

Methodologies

Topic 7

Clocking, Power Distribution,

Packaging, and I/O

Topic 8

Testing and Verification

Topic 6

Closing

the

Gap

(19)

• Complex Digital ASIC Design

Activity 1

Case Study: Scalar vs. Vector Processors

Activity 2

Part 2: Digital CMOS Circuits

To

pi

c

10

Se

qu

ent

ial

St

at

e

To

pi

c

11

Int

erconne

ct

To

pi

c

9

C

om

binat

io

nal

Lo

gi

c

(20)

• Complex Digital ASIC Design

Activity 1

Case Study: Scalar vs. Vector Processors

Activity 2

Part 3: CAD Algorithms

RTL to Logic

Synthesis

Technology

Independent

Synthesis

Technology

Dependent

Synthesis

x = a'bc + a'bc'

y = b'c' + ab' + ac

x = a'b

y = b'c' + ac

Placement

Detailed

Routing

Global

Routing

Topic 12

Synthesis Algorithms

Topic 13

Physical Design Automation

(21)

• Complex Digital ASIC Design

Activity 1

Case Study: Scalar vs. Vector Processors

Activity 2

Five-Week Design Project

P

I$

D$

Network

Network

Network

Xcel

P

I$

D$

D$

D$

Xcel

Accelerated

Instructions

Performance (Tasks per Second)

E

ne

rgy

E

ff

ici

e

ncy

(T

as

ks

p

e

r

J

o

u

le

)

Simple

Processor

Design Power

Constraint

High-Performance

Architectures

Embedded

Architectures

Design

Performance

Constraint

Fl

exibil

ity

vs

. S

peci

al

izat

io

n

Custom

ASIC

Less Flexible

Accelerator

More Flexible

Accelerator

(22)

• Complex Digital ASIC Design

Activity 1

Case Study: Scalar vs. Vector Processors

Activity 2

RTL

Devices

ISA

PL

Algorithm

Arch

Technology

Application

OS

Gates

Circuits

Complex Digital ASIC Design

I

Course goal, structure, motivation

.

What is the goal of the course?

.

How is the course structured?

.

Why should students want to take this course?

I

Activity 1: Evaluation of Integer Multiplier

I

Case Study: Scalar vs. Vector Processors

.

Example design-space exploration

.

Example real ASIC chips

I

Activity 2: Brainstorming for Sorting Accelerator

I

Course Logistics

(23)

• Complex Digital ASIC Design

Activity 1

Case Study: Scalar vs. Vector Processors

Activity 2

Course Motivation: Comp Arch Research Perspective

Transistors

(Thousands)

Frequency

(MHz)

Typical Power

(Watts)

MIPS R2K

Intel

Pentium 4

DEC Alpha

21264

Data partially collected by M. Horowitz, F. Labonte, O. Shacham, K. Olukotun, L. Hammond

1975 1980 1985 1990 1995 2000 2005 2010 2015

10

0

10

1

10

2

10

3

10

4

10

5

10

6

Sequential

Processor

Performance

10

7

Number

of Cores

Parallel Proc

Performance

Intel 48-Core

Prototype

AMD 4-Core

Opteron

(24)

• Complex Digital ASIC Design

Activity 1

Case Study: Scalar vs. Vector Processors

Activity 2

Cross-Layer Interaction is Critical

Circuits

Devices

Programming Language

Algorithm

C

o

m

p

u

te

r

A

rchit

e

ct

u

re

Technology

Application

Register-Transfer Level

Instruction Set Architecture

Microarchitecture

Operating System

Gate Level

Circuits

Register-Transfer Level

Microarchitecture

Gate Level

Need to quantitatively understand

area, cycle time, and energy

trade-offs to be able to create

new revolutionary architectures

that tackle the most

challenging physical design

issues

(25)

• Complex Digital ASIC Design

Activity 1

Case Study: Scalar vs. Vector Processors

Activity 2

Course Motivation: Circuits Research Perspective

Your Digital Circuit Here

Fighter Airplane

~100,000 parts

Your Analog Circuit Here

Intel Sandy Bridge E

2.27 Billion transistors

(26)

• Complex Digital ASIC Design

Activity 1

Case Study: Scalar vs. Vector Processors

Activity 2

Cross-Layer Interaction is Critical

Circuits

Devices

Programming Language

Algorithm

C

o

m

p

u

te

r

A

rchit

e

ct

u

re

Technology

Application

Register-Transfer Level

Instruction Set Architecture

Microarchitecture

Operating System

Gate Level

Circuits

Register-Transfer Level

Microarchitecture

Gate Level

Digital circuit researchers need to

appreciate the system-level

context for their subsystems;

cross-layer interaction can

generate some of the

most exciting research ideas

(27)

• Complex Digital ASIC Design

Activity 1

Case Study: Scalar vs. Vector Processors

Activity 2

Course Motivation: Industry Development Perspective

(28)

• Complex Digital ASIC Design

Activity 1

Case Study: Scalar vs. Vector Processors

Activity 2

Course Motivation: Industry Development Perspective

I

Ever increasing chip size and complexity

.

Microprocessors: 100M gates growing to 1000M gates

.

ASICs: 5M to 10M gates growing to 50M to 100M gates

I

Ever increasing costs and design team sizes

.

More than $30M for a 10M gate ASIC

.

More than $1M for re-spin in case of error (not including redesign costs)

I

18 months to design, only an 8 month selling opportunity in market

.

Fewer new chip-starts every year

.

Looking for alternatives (e.g., FPGAs)

I

For chip-starts that do happen

.

Conservative design

.

No time for exploration

.

Educated guess and then implement

(29)

• Complex Digital ASIC Design

Activity 1

Case Study: Scalar vs. Vector Processors

Activity 2

Cross-Layer Interaction is Critical

Circuits

Devices

Programming Language

Algorithm

C

o

m

p

u

te

r

A

rchit

e

ct

u

re

Technology

Application

Register-Transfer Level

Instruction Set Architecture

Microarchitecture

Operating System

Gate Level

Circuits

Register-Transfer Level

Microarchitecture

Gate Level

Most significant impact on area,

cycle time, execution time, and

energy is usually at architectural

and microarchitectural levels

(30)

Complex Digital ASIC Design

• Activity 1

Case Study: Scalar vs. Vector Processors

Activity 2

RTL

Devices

ISA

PL

Algorithm

Arch

Technology

Application

OS

Gates

Circuits

Complex Digital ASIC Design

I

Course goal, structure, motivation

.

What is the goal of the course?

.

How is the course structured?

.

Why should students want to take this course?

I

Activity 1: Evaluation of Integer Multiplier

I

Case Study: Scalar vs. Vector Processors

.

Example design-space exploration

.

Example real ASIC chips

I

Activity 2: Brainstorming for Sorting Accelerator

I

Course Logistics

(31)

Complex Digital ASIC Design

Activity 1

• Case Study: Scalar vs. Vector Processors

Activity 2

Scalar Processors with Multithreading

Programmer's

Logical

View

T0

Memory

T1

T2

T3

T4

Ti

Execution

Resources

Control

Instructions

Instructions

Memory

Architectural

Registers

(32)

Complex Digital ASIC Design

Activity 1

• Case Study: Scalar vs. Vector Processors

Activity 2

Scalar Processors with Multithreading

Programmer's

Logical

View

T0

Memory

T1

T2

T3

T4

Ti

loop:

load

a, a_ptr

load

b, b_ptr

add

c, a, b

store

c, c_ptr

a_ptr++

b_ptr++

c_ptr++

branch

Vector-Vector Add

Masked Filter

loop:

load

a, a_ptr

a_ptr++

branch

a = 0

op0

op1

...

branch

(33)

Complex Digital ASIC Design

Activity 1

• Case Study: Scalar vs. Vector Processors

Activity 2

Scalar Processors with Multithreading

Programmer's

Logical

View

T0

Memory

T1

T2

T3

T4

Ti

Typical

Core

Micro-Architecture

E

ne

rgy

Performance

MIMD

T0

T1

T2

T3

Instr Memory

Data Memory

Multi-threaded

Cores

(34)

Complex Digital ASIC Design

Activity 1

• Case Study: Scalar vs. Vector Processors

Activity 2

Vector-SIMD Processors

Programmer's

Logical

View

CT0

elm 0

elm 1

elm 2

elm 3

Memory

CTj

elm i

Architectural

Vector Register

with 4 Elements

Vector-SIMD

Arithmetic

Instructions

Vector-SIMD

Memory

Instructions

(35)

Complex Digital ASIC Design

Activity 1

• Case Study: Scalar vs. Vector Processors

Activity 2

Vector-SIMD Processors

Programmer's

Logical

View

CT0

elm 0

elm 1

elm 2

elm 3

Memory

CTj

elm i

loop:

vload A, a_ptr

vload B, b_ptr

vadd C, A, B

vstore C, c_ptr

a_ptr++

b_ptr++

c_ptr++

branch

loop:

vload A, a_ptr

vset F, A = 0

vop0 under flag F

vop1 under flag F

...

a_ptr++

branch

Vector-Vector Add

Masked Filter

(36)

Complex Digital ASIC Design

Activity 1

• Case Study: Scalar vs. Vector Processors

Activity 2

Vector-SIMD Processors

Programmer's

Logical

View

CT0

elm 0

elm 1

elm 2

elm 3

Memory

CTj

elm i

Typical

Core

Micro-Architecture

0

2

1

3

Instruction Memory

CP

VIU

VMU

Data Memory

V

e

ct

o

r

L

ane

s

E

ne

rgy

Performance

MIMD

Vector-SIMD

(37)

Complex Digital ASIC Design

Activity 1

• Case Study: Scalar vs. Vector Processors

Activity 2

Quantitative Area Evaluation

Motivation

Architectural Patterns

Scale VT Core

Maven VT Core

Evaluation

Single-Lane Vector-Thread Unit w/ 256 Registers

MIT CSAIL

Christopher Batten

32 / 42

(38)

Complex Digital ASIC Design

Activity 1

• Case Study: Scalar vs. Vector Processors

Activity 2

Quantitative Area Evaluation

Quad-Core

w/ Vertical

Multithreading

Vector-SIMD

1

thread

2

thread

s

4

thread

s

8

thread

s

1

co

re

w/

4

lanes

4

co

re

s

w

/

1

lane

e

a

data cache

instruction cache

control processor

integer functional units

floating-point functional units

memory units

register file

control logic

Single-core multi-lane design

reduces area by 15%

Multi-core single-lane design

increases area by 20%

1.75

1.25

1.00

0.50

0.00

0.25

0.75

1.50

N

o

rm

al

ize

d

A

rea

(39)

Complex Digital ASIC Design

Activity 1

• Case Study: Scalar vs. Vector Processors

Activity 2

Quantitative Performance and Energy Evaluation

0.8

1.0

1.2

1.4

1.6

Normalized Tasks / Second

N

o

rm

al

ize

d

E

ne

rgy

/

T

as

k

1.6

1.4

1.2

1.0

0.8

0.6

0.4

25

20

15

10

5

0

E

ne

rgy

/

T

as

k

(

J)

Multithreaded Multicore

(increasing number of

threads per core)

Quad-Core

w/ Vertical

Multithreading

1

thread

2

thread

s

4

thread

s

8

thread

s

Single-Lane Vector

(increasing vlen)

Vector-SIMD

(4 core w/ 1 lane)

data cache

inst cache

control proc

int func units

memory units

register file

control logic

1

e

lm

/lane

2

e

lm

/lane

4

e

lm

/lane

8

e

lm

/lane

leakage

(40)

Complex Digital ASIC Design

Activity 1

• Case Study: Scalar vs. Vector Processors

Activity 2

Simple RISC Processor ASIC

SP Datapath

SP Regfile

RAM

Subbank

(2KB)

RAM

Subbank

(2KB)

RAM

Subbank

(2KB)

RAM

Subbank

(2KB)

AHIP

Controller

SP Control

RAM Interface

VCO

Figure 2: STC1 chip plot and die photo. On the chip plot, the Exec Unit is colored blue, and the

Fetch Unit is colored green and purple.

Host

PC

PLX

9050

ATB0

Controller

Power

Supplies

Probe

Points

PCI

SDRAM

STC1

Host

ATB0

Daughter Card

Figure 3: Test System Block Diagram.

The test system includes the ATB0

test baseboard, a daughter card with

the STC1 chip, and a host computer

to control the test system.

5

(41)

Complex Digital ASIC Design

Activity 1

• Case Study: Scalar vs. Vector Processors

Activity 2

Simple RISC Processor ASIC

1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 4 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 750 --- . . . . 740 . . . . 730 . . . . 720 . . . . 710 . . . * * . 700 --- . . . * * * * . 690 . . . * * * * * . 680 . . . * * * * * * . . 670 . . . * * * * * * * . . 660 . . . * * * * * * * . . 650 --- . . . * * * * * * * * * * . 640 . . . * * * * * * * * * * . 630 . . . * * * * * * * * * * . . 620 . . . * * * * * * * * * * * * . 610 . . . * . * * * * * * * * * * . . 600 --- . . . * * * * * * * * * * * * . . 590 . . . * * * * * * * * * * * * * * . 580 . . . * * * * * * * * * * * * * . . 570 . . . * * * * * * * * * * * * * * * * 560 . . . * * * * * * * * * * * * * * * * 550 --- . . . * * * * * * * * * * * * * * * * * . 540 . . . * * * * * * * * * * * * * * * * . * 530 . . . * * * * * * * * * * * * * * * * * * 520 . . . * * * * * * * * * * * * * * * * * * * 510 . . . * * * * * * * * * * * * * * * * * * * 500 --- . . . * * * * * * * * * * * * * * * * * * * * 490 . . . . * * * * * * * * * * * * * * * * * * * * * 480 . . . . * * * * * * * * * * * * * * * * * * * * * 470 . . . * * * * * * * * * * * * * * * * * * * * * * 460 . . . * * * * * * * * * * * * * * * * * . * * * * 450 --- . . . * * * * * * * * * * * * * * * * * * * * * * 440 . . * * * * * * * * * * * * * * * * * * * * * * * 430 . . * * * * * * * * * * * * * * * * * * * * * * * 420 . . * * * * * * * * * * * * * * * * * * * * * * * 410 . * * * * * * * * * * * * * * * * * * * * * * * * 400 --- . * * * * * * * * * * * * * * * * * * * * * * * * 390 . . . * * * * * 380 . . . * * * * * 370 . . . * * * * * 360 . . . * * * * * 350 --- . . . . * * * * * * 340 . . . . * * * * * * 330 . . . . * * * * * * 320 . . . * * * * * * * 310 . . . * * * * * * * 300 --- . . . * * * * * * * 290 . . . * * * * * * * 280 . . * * * * * * * * 270 . . * * * * * * * * 260 . . * * * * * * * * 250 --- . . * * * * * * * * 240 . * * * * * * * * * 230 . * * * * * * * * * 220 . * * * * * * * * * 210 . * * * * * * * * * 200 --- * * * * * * * * * * 190 * * * * * * * * * * 180 * * * * * * * * * *

Figure 5: Shmoo plot for a

wide voltage range. The

hori-zontal axis plots supply

volt-age in mV, and the vertical

axis plots frequency in MHz.

A

*

indicates correct

opera-tion at that operating point

and a dot indicates failure.

The shmoo plot shows

com-bined results from two di

ff

er-ent chips.

la t9, input_end

forever:

la t8, input_start

la t7, output

loop:

lw

t0, 0(t8)

addu t8, 4

bgez t0, pos

subu t0, $0, t0

pos:

sw

t0, 0(t7)

addu t7, 4

bne

t8, t9, loop

b

forever

Figure 6: Test program for measuring

typical power consumption. The inner

loop calculates the absolute values of

integers in an input array and stores

the results to an output array. The

in-put array had 10 integers during

test-ing, and the program repeatedly runs

this inner loop forever.

7

I

RISC processor w/ 8 KB SRAM

I

TSMC 0.18 µm process

I

1.7

2.1 mm

I

610K Transistors

I

450 MHz at 1.8 V

(42)

Complex Digital ASIC Design

Activity 1

• Case Study: Scalar vs. Vector Processors

Activity 2

Scale Vector-Thread Processor ASIC

(43)

Complex Digital ASIC Design

Activity 1

• Case Study: Scalar vs. Vector Processors

Activity 2

Scale Vector-Thread Processor ASIC

TSMC 0.18µm • 7.14 Million Transistors • 16.6 mm

2

Core Area

CP

Vector Lane 0

Vector Lane 1

Vector Lane 2

Cache Control

Cache Tag CAMs

Cache

Data

RAM

Banks

V

M

U

V

I

U

X

B

A

R

Vector Lane 3

(44)

Complex Digital ASIC Design

Activity 1

• Case Study: Scalar vs. Vector Processors

Activity 2

Scale Energy vs. Performance Results

Motivation

Vector-Threading

Scale VT Processor

Maven VT Processor

Future Research

Energy-Efficiency of the Scale VT Processor

(45)

Complex Digital ASIC Design

Activity 1

Case Study: Scalar vs. Vector Processors

• Activity 2

RTL

Devices

ISA

PL

Algorithm

Arch

Technology

Application

OS

Gates

Circuits

Complex Digital ASIC Design

I

Course goal, structure, motivation

.

What is the goal of the course?

.

How is the course structured?

.

Why should students want to take this course?

I

Activity 1: Evaluation of Integer Multiplier

I

Case Study: Scalar vs. Vector Processors

.

Example design-space exploration

.

Example real ASIC chips

I

Activity 2: Brainstorming for Sorting Accelerator

I

Course Logistics

(46)

Complex Digital ASIC Design

Activity 1

Case Study: Scalar vs. Vector Processors

• Activity 2

Brainstorming for Sorting Accelerator

I$

D$

Arbiter

Xcel

P

Performance (Tasks per Second)

E

ne

rgy

E

ff

ici

e

ncy

(T

as

ks

p

e

r

J

o

u

le

)

Software

Baseline

Sort array of integers stored in memory

Proc sends Xcel base address and size

Proc waits for Xcel to finish sorting

I

Software Baseline

.

Insertion sort

O

(

n

2

)

.

20% of dynamic

instructions are lw/sw

I

Hardware Accelerator

.

What is ideal speedup

assuming we still use

insertion sort?

.

What if we use a

different algorithm?

.

Sketch hardware impl of

a sorting accelerator

.

Where will accelerator

be in the energy/perf

space?

(47)

Complex Digital ASIC Design

Activity 1

Case Study: Scalar vs. Vector Processors

Activity 2

RTL

Devices

ISA

PL

Algorithm

Arch

Technology

Application

OS

Gates

Circuits

Take-Away Points

I

Complex digital ASIC design is the process of

quantitatively exploring the

area, cycle time,

execution time, and energy

trade-offs of

general-purpose and application-specific designs

using

automated standard-cell CAD tools

and then

to transform the most promising design to

layout

ready for fabrication

I

Course can provide useful experience with

cross-layer interaction for students interested in

pursuing research in

computer architecture

or

circuits

or students interested in pursuing a career

in in

industry development of ASICs

References

Related documents

In short, the Prospero framework allows the public to both contribute functionality and content in the form of widgets, and control the widgets displayed on

Aiming to align provider incentives toward improving quality and effi- ciency, the Center for Medicare and Medicaid Services is considering broader bundling of hospital and

Only preoperative use of pain medication and preopera- tive neuropathic pain were associated with increased postoperative pain after primary THA in a fast-track set- ting, including

Standardized evidence-based approaches to chronic health conditions with flexible and tailored self-management and personalized co-care plans could better match con- sumer needs

145 (This is another question the forty-year-old statute understandably failed to address.) If the consumer is responsible for the allegedly infringing acts, then the

However, hydrogen adminis- tration reduced the number of MDA-positive vessels in marrow compared with the model group, indicating that hydrogen considerably decreased the damage

Besides, given that there is lack of an operational definition for frailty and sarcopenia, it would be a worthwhile endeavor to investigate the com- bined or sequential use of

Keywords: Anatomical and reverse total shoulder arthroplasty, Suprascapular nerve, Screw placement, Glenoid cavity, Anteversion angle, Retroversion angle, Inclination angle,