• No results found

Modeling of a hardware VLSI placement system: Accelerating the Simulated Annealing algorithm

N/A
N/A
Protected

Academic year: 2019

Share "Modeling of a hardware VLSI placement system: Accelerating the Simulated Annealing algorithm"

Copied!
100
0
0

Loading.... (view fulltext now)

Full text

(1)

Rochester Institute of Technology

RIT Scholar Works

Theses

Thesis/Dissertation Collections

7-2005

Modeling of a hardware VLSI placement system:

Accelerating the Simulated Annealing algorithm

William Merle Batts Jr.

Follow this and additional works at:

http://scholarworks.rit.edu/theses

This Thesis is brought to you for free and open access by the Thesis/Dissertation Collections at RIT Scholar Works. It has been accepted for inclusion in Theses by an authorized administrator of RIT Scholar Works. For more information, please [email protected].

Recommended Citation

(2)

Modeling of a Hardware VLSI Placement System:

Accelerating the Simulated Annealing Algorithm

by

William Merle Batts Jf.

A Thesis Submitted in Partial Fulfillment of the Requirements for the Degree of

Master of Science in Computer Engineering

Supervised by

Visiting Assistant Professor of Computer Engineering Dr. Marcin Lukowiak

Department of Computer Engineering

Kate Gleason College of Engineering

Rochester Institute of Technology

Rochester

,

New York

July 2005

Approved By:

Marcin tukowiak

Dr. Marcin Lukowiak

Visiting Assistant Professor of Computer Engineering

Primary Adviser

Stanisfaw Radziszowski

Dr

.

Stanislaw Radziszowski

Professor of Computer Science

Greg P. Semeraro

Dr

.

Greg Semeraro

(3)

Thesis Release Permission Form

Rochester Institute of Technology

Kate Gleason College of Engineering

Title: Modeling of a Hardware VLSI Placement System: Accelerating

the Simulated Annealing Algorithm

I, William Merle Batts Jr.,

hereby grant permission to the Wallace Memorial Library to

reproduce my thesis in whole or part.

William Merle Batts Jr.

William Merle Batts Jr.

7

c9f

-

!)5

(4)

Dedication

Forall ofmy friends and

family,

especially

Denise,

whoseunwavering supportmade this
(5)

Acknowledgments

I'dliketo thankmyadviserDr. Marcin Lukowiakfor hisguidance, wisdom andunending

patience. Fd also liketo thankmy committee members Dr. Stanislaw Radziszowskiand

Dr.

Greg

Semeraro,

whoseinputandtimegivento thisworkisgreatlyappreciated.
(6)

Abstract

An essential step in the automation ofelectronic design is the placement ofthe physical

componentsonthetarget semiconductordie. Theplacement steppresents theopportunity

toreducecostsintermsofwirelengthandperformance

degradation;

howeverit iscompute intensive andis NP-complete interms ofobtaining an optimal solution. As designs have

grown in complexity and gate count, obtaining an optimal solution is not feasible due to time to market constraintsor sheer compute effort required. Heuristicalgorithms allowfor

efficient but sub-optimal designs to be produced with a reduction in processing time. A widelyused algorithmisSimulated

Annealing

(SA).

The goal ofthis work was to

develop

a modelthat would enable an analysis into the

feasibility

of

developing

a hardware accelerated placement system which uses SA at its

core. The SA heuristicwas analyzed for possible improvements in efficiency with focus given to

targeting

the system forhardware. A solution

implementing

parallel computing with specialized hardware configurations inside a field programmable gate array

(FPGA)

was investigatedas

having

the possibilityto improvethe efficiencyofthe SA-basedalgo rithm. All supportingsubsystems were alsodescribed forahardware accelerated model.

Alarge speedupwas analyticallyshown from both acceleratingthecritical path ofthe SA algorithm aswell as novel methods of

improving

SA'sefficiency. Asdata throughput
(7)

Contents

Dedication iii

Acknowledgments iv

Abstract v

Glossary

xii

1 Introduction 1

2 Motivation 3

3 Background 5

3.1 ICDesignFlow 5

3.1.1 Concept - Research& Development 6

3.1.2 Concept- High Level Design 6

3.1.3 Design

Entry

7

3.1.4 Synthesis 7

3.1.5 Placement &

Routing

8

3.1.6 Physical Verification & Simulation 9

3.1.7 Fabrication 10

3.2 ElectronicDesign Automation 10

3.2.1 Logical Description 11

3.2.2 Structural Description 15

3.3

Library

Exchange Format/DesignExchangeFormat 16

3.3.1 LEFSyntax 16

3.3.2 DEFSyntax 18

3.4 PhysicalDesign Automation 20

3.4.1 Placement 21

3.4.2

Routing

23
(8)

3.4.3 Back Annotation 24

4 Placement 25

4.1 Exhaustive Search 27

4.2 GeneralizedHill Climbing/Local Search 28

4.3 Min-Cut 29

4.4 Genetic Algorithms 30

4.5 Tabu Search 30

5 Simulated

Annealing

32

5.1 Physical Model 32

5.2 ApplicationtoCombinatorial Optimization 33

5.3 SA& Placement 33

5.4 SA Research 37

5.4.1 Parallel SA 38

5.4.2 HardwareAssisted SA 39

5.4.3

Greedy

Mixed Perturbations 40

5.4.4 Multiple HeuristicCombination 41

6 SoftwareImplementation 43

6.1 Software RevisionsandLessons Learned 43

6.1.1 Experimental Code 43

6.1.2 Initial LEF/DEF Placements 45

6.1.3 Benchmarks 46

6.2 Logical Modules 46

6.2.1 LEF/DEF Reader & Writer 47

6.2.2 Design Perturbation 48

6.2.3 Wirelength Estimator 49

6.2.4

Overlap

Calculation 50

6.2.5 MoveAcceptance 51

6.2.6 Design Update 51

6.2.7 Temperature Update 52

6.3 Temperature Schedule 52

6.4 BenchmarkCircuits 54

6.5 Characterization & Optimization 55

(9)

7 HardwareModel 61

7.1 Wire Length Estimator 61

7.2

Overlap

Detector 63

7.3 MoveAcceptance Logic 65

7.4 Interfaces 66

7.4.1 DatapathControl 67

7.4.2 Component

Supply

68

7.4.3 Design Update & Algorithm Management 69

8 MethodofInvestigation 70

9 Results 72

9.1 SoftwareImplementation 72

9.2 Hardware Model 74

9.3

Speedup

Justification 74

10 Conclusions 77

10.1 Discussion 77

10.2 Future Work 78

10.2.1 Software Implementation 79

10.2.2 Control Unit 79

10.2.3 Data Management 79

10.3

InClosing

80

Bibliography

81
(10)

List

of

Figures

3.1 Simple IC Design Flow 5

3.2 One-bit Adder Example VHDL Description 12

3.3 One-bit Adder Logical Schematic 12

3.4 LEF Syntax for

SIZE-only

MACRO Statement 17

3.5 Example LEF Definitionof aOne-Bit Adder's Components 17 3.6

Bounding

Box Wirelength Estimationand

Overlap

Penalty

18

3.7 DEF Syntax for Simple Net

listing

19

3.8 Example DEF Definitionof aOne-Bit Full Adder 20

3.9 One-bit Adder Initial Random Placement 21

3.10 One-bit Adder Final Placement 22

3.11 Example DEF Definition of aOne-Bit Full Adder 23

4.1 Seriesof

Neighboring

Solutions

Containing

aLocal Minimum 29

5.1 Molecules' MovementsperTemperature Region 32

5.2 Simulated

Annealing

Placement FlowChart 34

5.3

Bounding

Box Wirelength Estimationand

Overlap Penalty

35 5.4 Characteristics

Justifying

FPGA Application 40

6.1 Initial Software Algorithm Organization 44

6.2 Software Module Interaction 47

6.3 Sample Temperature Schedule 53

6.4 SoftwareSA Placement Time Profile 56

6.5 Software SA Placement Costvs. Iteration 58

6.6 InitialandFinal Placements ofBenchmark ibm05 60

7.1

Top

LevelRTLSchematic 62

7.2 WirelengthEstimator RTLSchematic 64

7.3

Overlap

Detector RTL Schematic 65

7.4 Move AcceptanceLogicRTL Schematic 66

(11)

8.1 Test

Method

(12)

List

of

Tables

6.1 Benchmark ibmO 1 - ibm04 Information 55

6.2 Benchmark ibm01-ibm04 Time Profiles (%

total)

55

6.3 Cost Delta Function Time Analysis 57

6.4 Average Cost Delta Function Composite Analysis 57

7.1

Top

Level Hardware Input Interface 63

7.2

Top

Level Hardware OutputInterface 64

9.1 TimeComparisontoaMature Placement Tool 73

9.2 DieGrid PointsperBenchmark 74

9.3 FPGAClock Cycle RequirementperCalculation 75

(13)

Glossary

ASIC Application Specific Integrated Circuit- An IC

whichhas beenconstructedto

perform alimitedrange offunctionsveryefficiently,usuallyfasterthanaGPP. Its

functionality

cannotbechangedafterit has beenmanufactured.

CAD Computer Aided Design

-Designworkmanaged and enchanced

by

theuse of

computertechnology.

component Aunitwhich provides sometypeof

functionality

toahardware designand

may be combined with other units to implementmore complicated behavior. An adder is a common example of a component, providing the ability to add

twovalues withtheabilitytobecombinedinto a larger

design,

such as a mul tiplier.

D

DEF Design Exchange Format

-An ASCII textbased format which defines a de

sign's specific organization interms ofinstances of components andintercon

nections. Used in conjunction with LEF. A community project organized

by

the Silicon Integration Initiative(SI2).

die Thesemiconductortarget on whichintegratedcircuits are constructed.

(14)

E

EDA ElectronicDesign Automation- A CADtechnique

allowingthedesign flowof electronicdevicestobecomemoremanagable,reducingassociated overhead.

FPGA Field Programmable Gate

Array

-A digital processing device which has the

abilitytobe programmed afterit ismanufactured allowing its

functionality

to bechanged.

FSM Finite State Machine- Logic

whichconsists of afinite number of states with transitions and outputs defined

by

the current state and possibly the value of

inputs.

G

GPP General Purpose Processor

-Aprocessor whichimplements

functionality

through

groupsofinstructionsand genericfunctional units ratherthan specializeddata structures.

H

HPWL Half Perimeter Wire Length- A

method ofapproximatinga net's interconnec tionwirelength

by

fully

enclosing itinaminimal

bounding

box and

taking

half

oftheperimeter.

I

IC Integrated Circuit - A

general term for an electronic device which contains active semicondictor switches andpossiblypassive devicessuch as resistors or

(15)

IP Intellectual

Property

- Original

workthatisatitscore,

intangible;

an algorithm

isan example ofthis.

layer The fundamental construct of a semiconductor device. The combination of

different types oflayers are usedto implement functionality.

Normally

metal

layers are used for interconnection while

N+,

P+ and polysilicon are used to

createtransistors.

LEF

Library

Exchange Format- An ASCIItextbasedformat

whichdefinesthespe

cificIC

technology

and components usedtoimplementadesign. A community

project organized

by

theSilicon Integration Initiative (SI2).

LUT

Look-up

Table

-Aconstruct usedto implementlogic functionality. Insteadof

implementing

direct

logic,

theseare programmable memories.

N

net A connectionbetween the ports oftwo ormore components or I/O pads, can

beenvisionedaswiresconnectingtheports of components.

netlist Thecollectionof nets and components whichdescribea specificdesign.

PDA Physical Design Automation

-A C-ADtechnique allowingthe designflow of

electronicdevicestobecome lesscomplicatedintermsofsatisfying designrule

checks,electricalproperties,physicalorganization,etc.

port Thepoint on acomponenttowhich a netconnects.

(16)

R

routing

Theact ofphysically

defining

all

nets'

connections onthe targetsemiconductor

die.

Usually

performedwith metallayers.

SA Simulated

Annealing

-Astochasticheuristicwhichuses acoolingmaterial as

a modelinorderto solvecombinatorial problems.

(17)

Chapter

1

Introduction

Astime to market pressures andintegratedcircuit

(IC)

design complexity

increase,

reduc

tion oftime for any step in the design flow may provide an advantage,

technically

and

economically. An integral step in the IC design flow is placement in which components

targeted to define a device's

functionality

are

logically

placed on the semiconductor die.

Efficientplacements are desirable in thatoperation is improved

by

reducing

delays,

para

sitic lossesand ifconsidered

during

placement, other cost factors [30]. The act of

finding

a placement intermsof a global optimum is anNP-completecombinatorial optimization

problem

[36]

givingthatitisnotfeasibletoapproachtheplacement of alarge designwitha

brute force method [21]. Inordertoreduce computational requirements whenperforming

placements heuristics are often employed

[2] [33]

[40];

these algorithms do not produce

optimal results but do produce acceptable solutions

(satisfying

design constraints) while

reducing computing time. Forexample, there are over 200,000 placement solutions for a

nine componentdesign on athree

by

three grid, all which must be evaluated inorder to

find an optimal solution.

Using

a heuristic method one can reduce thenumber of evalua

tions to hundreds

-a cle-ar s-avings ofcomputing effort. One such heuristic method used

inplacementisSimulated

Annealing (SA)

which, asitsname

implies,

ismodeled afterthe

cooling ofmetal andthe behavior its molecules exhibit [1]. The focus ofthis work is to

analyze, optimize andacceleratethe Simulated

Annealing

heuristic withrespecttoitsuse

implementing

aplacement algorithm.

This document is organizedas

follows;

Chapter 2 provides a motivationfor this work
(18)

by

defining

itsplaceincurrentIC designflowsandexploringprior methods ofaccelerating

placement heuristics. Chapter3 provides thebackgroundof electronic design automation and IC design flow. Chapter 4 describes in depth and compares algorithms which can be usedforplacement. Chapter 5provides an analyticaldescriptionofthe SimulatedAnneal

ing

algorithm and previousresearch

looking

to improve its speed as a placementheuristic. Chapter 6 describes thesoftware implementationof aplacementtoolusedto characterize

the

timing

and analyze the critical path ofthe Simulated

Annealing

algorithm. Chapter

7 describes the hardware model generated using observations from the software imple

mentation. Chapter 8 gives the analytical process used to validate the hardware model's increase in performance. Chapter 9 details the results ofthe investigation while Chapter 10 states conclusionsdrawnfromtheinvestigationresults and provides futuredirectionfor

(19)

Chapter 2

Motivation

Itwould be difficult to arguethat anytechnique or method allowingfor adecrease in the developmenttime of aproductwouldnotbe desirable. This isespeciallytruefortheelec

tronics

industry

whichhasno foreseeableslowdown in innovationanddevelopment [35].

Many

goods currently ship with IC devices providing specialized capabilities which are subject to a process that may be the critical path in the product development cycle. To

allowfurthergrowthinthecomplexityofthesedeviceswithout

imposing

restrictivedevel opmenttime overhead, accelerated methods atthecore ofthedesignflowshouldbesought

[41].

As the gate count increases (in the above mentioned

devices)

so does the associated

developmenttime intermsofcomputationalrequirements. Computeraideddesign

(CAD)

of electronicdevicesand

targeting

themtophysicalfabrication

(commonly

knownasElec tronic Design Automation

(EDA)

and Physical Design Automation

(PDA),

respectively)

hasreducedthis time toa great extentbut improvement isrequired asdesignsgrow. Current

EDAandPDAtoolsautomatically satisfythevarious requirementsparticularto fabrication

processes while optimizingthedesignwherepossible (Mentor Graphics Design

Architect,

Mach TA and Calibre

[24];

Synopsis

Galaxy,

DesignWare and

Discovery

[39];

Cadence

Encounter [10]). The placement process is compute intensive

[30] [36]

and represents a

significantamount oftime inthedesign flow.Aswithanyofthe stepsinthedesign

flow,

it
(20)

thisstepwithin adevelopmentpath. Simulated

Annealing

[2

1]

isaverycommon algorithm

usedtoimplementa placementtool

[2] [32] [33]

[40],

acceleratingthisalgorithmtherefore has beenthe center ofmanystudies

[11]

[14] [18]

[23].

Priorwork inthisfield has

firstly

focused on

improving

the Simulated

Annealing

algo

rithmthroughanalyzingandmodifyingtheperturbationtypesand costfunctions

[14] [18]

[19]

[28]. Otherworkhas lookedto parallel implementation ofthe Simulated

Annealing

algorithm purely in software to produce a speedup

[1]

[11]. The serial nature ofthe al gorithm does not

directly

lend itselfto this approach thoughparallel implementationhas

been showntobe successful

[8]

[23]. Otherapproachesofspeeding upthe Simulated An

nealing algorithmhave focusedonhybrid implementations usingother searchmethods as

an augmentation

[18]

[20]. Hardware implementations have lookedtospecializeddataand

processingstructures designedto be implemented in largematrices of executionelements onFPGAs[16][41].

Analyzing

the above prior work, it seems a hardware accelerated parallel processing

approach would have the ability to provide a substantial speedup.

Taking

direction from

[8]

[11] [23]

and

[41],

thiswork seekstofirstcharacterizethepure software placementtool

using SAasitscoreheuristicto

identify

thecritical pathinthedata flow.

Having

thisinfor mation,atailoreddatapathcanbe developedwhichwould provide some amount ofspeedup

overthesoftwaretool.

Considering

hardwareinterfacerequirements

(memory

access,data structures, etc.) italso seemsfeasiblethattheparallel operationofmultiple identical

datap

athswould producenear-linear speedupstoa point.

Applying

knowledgeofthebehaviorof

theSAalgorithmwhen appliedtoplacement also provides novel approachestoahardware

(21)

Chapter 3

Background

3.1

IC Design Flow

Implementing

an electronicdesign in anintegrated circuit is

by

no means atrivial task, a

number ofsteps occurbetween theconception oftheidea to the

delivery

ofthepackaged

IC. Eachofthe steps inthedesignprocess are

highly

correlatedwithothers

[30];

the pro

cess is usually not executed linearly. Figure 3.1 onlygives an overview ofthe IC design

flow steps. This figure is very simplified and only gives a general outline ofthe entire

Concept 1

DesignEntry

Synthesis

Placement &Routing 1

I

Physical Verification& Simulation

Fabrication

i

Review/ Validation/

[image:21.542.66.465.409.611.2]

Testing

Figure 3.1: SimpleIC Design Flow

processaseach stepcontains severalunderlying steps. Asshown, theprocessmayreverse

(22)

Ofthe

following

steps, this thesis focuses onplacement. Herethe components ofthe

physical designare arrangedinanattemptto producean optimizedlayout. Allother steps

areoutsidethescope ofthisworkbutareimportantin that together

they

definea semicon

ductor design flow.

3.1.1

Concept

-

Research &

Development

The initial phase ofdesign involves the analysis ofthe original

idea,

refinement and re

search. Some initial steps areto determine usefulness, profitability,

feasibility,

whichtar

get

technology

isused, andoverall project goals/requirements.

Any

ofthe steps herewill

impact therest ofthe process, an example ofthis wouldbe the target technology. Ifhigh

performance and volume isa projectgoal, application specific IC

(ASIC)

design may be

targeted whereas ifcost is a

limiting

factororthe design is to be produced in small vol

umesorforprototyping, lessexpensive standard cell orFPGAtechniquesmay beutilized.

Each decision here leads the project down different design

flows,

this must be carefully

considered inthis step.

3.1.2

Concept

-

High Level Design

Here theoverall system architecture is definedand subsystems appear whichmay also be

further broken down into smaller components. The high level designistaken fromthe re

sultsofthepreviousstepand representedin modelingtools. Atthis point,eitherproprietary

ortarget

technology

vendor suppliedlibraries may be utilizedtoreduce duplicate design

effort as subsystems ofthe design may be readily available in these libraries as common

components.

Early

insights to optimizations can be discovered in this step such as data
(23)

3.1.3

Design

Entry

This step begins the use ofEDA CAD tools and generates the logical representation of

the design. Two traditional types ofdesign entries are

typically

used, schematic capture

and text based modeling languages with the latter

being

more popular for large designs.

Schematic capture involves using a GUI to represent components and connections. The

designercreates adiagram todefinea system inwhichthe tool thencreates an intermedi

ate representationto be passed on to the next step.

Modeling

languages offerportability

and selfdocumentationwhereinhardwaredesignsarerepresentedthrough theuseofsource

code. Two verycommonlanguagesareVHDL

[13]

and

Verilog

HDL

[12],

thesearewidely

used todefine hardwaresystems andtosimulate adesign before

being

mapped toatarget

technology. Others languages such as SystemC

[26]

lookto fill in the design flow gaps

that the two aforementioned languages leave open

by

not allowing overall systems to be

modeledingreatdetail.

Many

IDEs allow forboththe concurrent use of schematic entryand a modeling lan

guage in order to leverage the strengths ofboth methods (the high-level design can be

viewed inschematics andthe low-level components can be viewed in HDL).

Using

such

a tool allows for any changes in one method to update the other, maintaining coherency

across all views.

A designercan findoptimizations inthisstepthrough intelligentconstruction; a good

designerwill produceefficient, correct source code. Sometoolsused inthisstep are Men

torGraphics DesignArchitect

[24],

OrCAD PSpice

[10]

andXilinx IDE(IntegratedDesign

Environment) [42],

MentorGraphicsModelSim

[24]

andAldec Active-HDL

[5]

formodel

languageand/orschematic capturedevelopmentand simulation.

3.1.4

Synthesis

The designs entered in the previous step are now translated into

library

or custom com
(24)

size,shape, connection ports, and electrical characteristics. Thisisthefirststep inplacing

thedesignontothe targetsemiconductordie.

Eachcomponenthasportswhichservetomovesignalstoandfroman exterior connec

tionwhile groups of ports usuallyfromdifferent components may be connected together

to formnets. Itistheunique combination of components and connections whichgives one

design differentcharacteristicsfromanother.

Thechoice of

technology

usedgreatly influencesthisstep. IfanFPGAor standard cell

library

is used, the designer mayuse atool toconvertthedesign into itsphysical form in

which a componentlist and a netlistwill be generatedrepresenting the project in circuit

form. Iffull customASIC

technology

ischosen,anotherdesignerwill havetocreate com

ponents from the output oftheprevious step either from scratch or from a generic form

component. Some tools used here are Mentor Graphics' Design Compiler

[24]

and Ca dence's BuildGates [10]. AtthispointtheCADtoolsinuse move fromEDAtoPDAasthe

logicalrepresentation ofthedesigniscomplete, furtherstepsdealwithapplyingthelogical

designto thephysicalprocesschosenintheconcept stages.

3.1.5

Placement

&

Routing

The components from the previous step are physically applied to the floor ofthe target

semiconductor die and the physical design begins to take shape. Here the placement of

each component generatedintheprevious stepwithrespectto everyothercomponentbe

comesimportant inordertominimizewiring delaysand congestion as well astominimize the target die size. Asthe focus ofthis work involves algorithms atthecore ofthis step,

more about placement willbe expanded uponlater.

Routing (instantiating

theconnections

defined

by

all nets onthe semiconductor

die)

isof obvious concern as again wire lengths

shouldbe optimizedto improve performance. Mostdesigns are automatically placed and

routedforanyprojectof appreciable size. Directmanipulation

by

thedesigner issometimes

warranted but this is limited to small areas which require attention.

Efficiently

routing a
(25)

Placementandrouting may beperformedas two independent sub-steps or as a single integrated step. Performed as independent sub-steps, routing requirements must be con

sidered inthe placementstepasto providea routable design. The possibility of

having

to partiallyre-placethedesignexistsasit ispossiblethatsome aspect oftheinitialplacement

will create problems inrouting. The integratedplace and route step may possibly suffer fromunacceptably

long

runtimes as thesearch spacecreated

by

combining both steps is

much largerthaneither step

by

itself. With current semiconductorprocesses offering full

routing overthe components, where all interconnectionsexist above the transistor

layers,

independenceofthe two stepsbecomesmore reasonable.

3.1.6

Physical

Verification

&

Simulation

Atthisstage aphysical representation oftheprojectis complete andall

timing

andelectri

cal characteristics ofthesystem canbe known allowing foran accurate simulationto take

place. This is known as post-place and route simulation and makes use ofparasitics ex

tractionwhichusesthegeometries of eachtransistor to

fully

specifyavery detailedmodel. Previoussimulations could not account forthe thesevalues (dueto

being

specifictophys ical construction) and were eitherignoredor estimated. Herethe system canbe measured

toensure thephysical designwill meet criteria setforth intheprecedingstages.

Now

having

the physical characteristics ofthe target, post layout verification such as designrule checks and layoutversus schematic canbeperformedtomake surethe layout doesnot violateany fabricationrulesandbehavesasthedesigners

intend,

respectfully.

Any

mistakes here will

likely

sendtheproject back to the place and route stage orworse, the

synthesis stage ifa major fault is discovered. It is possible that the fundamental design would require modification at whichpoint there is no choice but to re-designaround the

problemandre-enterthecorrectioninthelogicalrepresentation. This obviouslygivesthat care shouldbe takenup to thispoint toensure correctness. Ifthedesign passes alltests it

(26)

3.1.7

Fabrication

Ifthedesign istargeted toan

ASIC,

thephysically definedproject iscreatedin a semicon

ductor

foundry

first as whole wafersthen individual dies and

finally

packaged dies ready

for use. In ordertoefficientlyproduceICsabuilt in selftestcanbe included inthedesign

toallow dies tobe testedbefore packaging. Inthis way ifadie fails its selftestitcanbe

discarded beforepackaging,savingtimeand money. IfanFPGAisthetargetofthe

design,

theFPGA isprogrammedusingthebitstreamgenerated

by

thedesignsuitetargeted toward

theparticular FPGA used.

Typically,

the manufacturerofthe FPGAprovides a software

package to take a design from a concept to the

finished,

programmed FPGA without re

lianceon

third-party

tools,though

third-party

suites existthatreplacethis

functionality

[5].

Thefinishedproductisthenmarketed and sold orincluded inalargerproject

depending

on

itspurpose.

3.2

Electronic Design Automation

Electronic Design Automation is aCAD

technology

aimed atmanaging therequirements

ofworking with designs targeted to work inside of electronic technologies (custom

IC,

standardcell,FPGA). The termEDA is usuallyan umbrellatermappliedtoall CAD tech nologies usedtomanagedesigns from ideas to silicon;EDAapplies toallCADtoolsused

beforeapplicationto thephysical process and physicaldesignautomation

(PDA),

discussed

below,

involvesallCAD toolsusedto manageadesignafterthispoint.

EDAiswidelyemployedas a methodtoensurethat a group'sintellectual property

(IP)

is properly utilized

by

allowing modularizationofdesigns and creation ofproprietary li

braries. Furthermoreand more

importantly,

EDAtoolsallowforreuse ofpreviouslycreated

IP,

reducing duplicate effort. Thisorganizational functionofEDAtools is notits primary

focus;

EDA tools allow oneto easily navigate overwhelminglylarge designs with relative

ease. Designs have grown in size both insubsystemhierarchies and puretransistor count

(27)

revelation,in 1965 Intelco-founderGordon MoorestatedthatICtransistorcount will dou

ble everytwo years. In 1971 the Intel 4004 hadatransistor count of

2,300,

by

1982with

the Intel286 the counthadrisentoover 130,000. Knownas Moore's

Law,

this prediction

has held trueandisforeseentodoso[35].

EDA's primary purpose is to provide a method of

translating

a design input

descrip

tion to a logic

description, i.e.,

combinations ofbasic logic functions such as

AND, OR,

XOR. As previously mentioned, the input type could be an integrated development en

vironment

(IDE)

based schematic capture where a design is

"drawn"

in terms of visual

elements or atextbasedlanguagesuchasVHDL. The EDApackagewilltake theseinputs

and create generic logicaldescriptionswhichmaybe optimizedusing librarieseitherpre

viously created and archived

by

developers or provided

by

the target

technology

supplier.

The strength ofEDA CAD

technology

is

being

ableto simplyrepresent, navigate and

test largedesigns. EDA toolsprovidethe intelligence tooptimize designs withpreviously

and specifically developed components. This savesduplicate development time allowing

a group to build upon previous work and provides superior implementations of systems

without requiring intimate knowledge ofthe target technology, optimal logical function

implementationsor excessiveinteractionwiththedesigner.

3.2.1

Logical Description

The logical descriptionrepresentsthefirststepinsideanEDA CADtool toward therealiza

tionof adesignas a

fully

functional electronicdevice.

Taking

asaverysmallexamplethe

generation ofa one-bitfulladder one can showthe translationfrominputtologic

(usually

the full adder is an atomic elementof an electronicdesign but for this example it canbe

decomposed). From

fundamentals,

a one-bitfulladderisgiven as

S0

=

AeBCl

(3.1)

C0

= ({A *

B)

+ ((A+

D)

*

CA

(3.2)

(28)

where

A,

B and

Cl

representthe adder's inputs and

S0

and

C0

represent the adder's sum output andcarryoutput,respectively.

Having

its behavior

defined,

the design must be entered using one ofthe aforemen

tioned methods; the core VHDL is presented in Figure 3.2 while a schematic capture is given

by

Figure 3.3. The syntaxforthe VHDL statements assigns the logic value ofthe

So <= A xor B xor Ci;

Co <= ( (A and B) or ( (A or B) and Ci) ) ;

Figure 3.2: One-bit Adder Example VHDL Description

functionontherighthandside ofthe signalassignmentoperator(left arrow)to the output signal onthe left. AVHDLcompiler willthenanalyzethefilecontainingthese statements

and assemblealogical descriptionwhich canbevisualized

by

Figure3.3 whichalsowould be theinputof a schematic captureEDAtool.

The EDAtoolsnow

having

thelogicalrepresentation ofthedesigncandeterminehow

A 3 Ci

Figure3.3: One-bitAdderLogical Schematic

the system willact given certain

inputs,

inotherwords, the design canbe testedand ex

amined for correctbehavior.

Usually,

this isthe firsttime the design istested with actual

inputs and usuallybegins at the module level suchthat given known

interfaces,

individ

ual designers can

independently

create modulesthatwill produce aworking system after [image:28.542.68.381.370.542.2]
(29)

testbenchis generatedthatcontains testinputswithknownoutputs suchthatfastgo/no-go

tests canbe executed

during

development reducing the code/test/debugcycle's time. For

something as simple astheone-bitadder, anexhaustive testsetwouldbe usedtocheckall

possible combinations ofinputswhereasfor developmentpurposesina muchlarger design

alimitedtest setwouldbeusedforspotchecks

during

development. A

larger,

more com

plete testwould thenbe executedtoensurethat themodule willcorrectly function before

integration intothesystem.

Thebasic

building

blockusedto implement a functionwithin a designis known as a

logicgate;withintheexamplefigurexor,and,and or are allinstancesof gates whichim

plementbasic logic functions. Theelementsinthisexamplecan alsobecalledcomponents,

a structure which implements some amountoffunctionality. Inthis casethe components

implementbasic

functionality,

thesemay implementmorecomplexfunctionsandit is upto

the designerofthe

library

todefinetheirgranularity. In synthesizing anarbitrary function

it may bemore efficienttoimplementa multiplexer-based

look-up

table

(LUT)

ratherthan

pure logic as shownhere. A synthesis tool will perform forwardand reverse elimination

in order to determine this; it's operating

theory

is outside the scope ofthis thesis. Here

the tool traverses the

boundary

between logicaland structuraldescriptions in mappingthe

logic to components. It is

likely

thatunless an optimized

library

component existsforthe

example one-bitadder and

depending

onthelevelofoptimization,itwouldbeinstantiated

in aLUT.

Tobedefinedasacomponent, interconnectionsmustbeable tobe madeto/fromother

components. To specifythepoints wheretheseconnectionsare made ports are used. Aport

is,

as its name

implies,

a path which passes fromoutsidethe componentto the functional

elements inside. Aportmayhavepropertieswhichspecifythedirectionofthe logicaldata

flow inordertoallow anEDAtooltodeterminea component isused correctly. The inter

connections which aremade between ports of components are known as nets.

Typically,

anetonly connects toone port on a componentandbetweenalimitednumber of compo

nents, but may

theoretically

connectanynumber of nets ofanynumber ofcomponents.
(30)

The electrical properties ofthe target

technology

limit the number of input ports an

outputport can drive. This value is known as fan-out.

Conversely,

there are real-world

limitationsonthenumber of outputs one inputcansupport, this isnotusuallyencountered

as multiple drivers ofan input are avoided (or at least advised against

during

EDApro

cessing). Thisvalue isknownasfan-inand

is,

along with

fan-out,

calculatedforeach net

and verified nottoexceed limitsasdefined

by

the target technology. Intheexampleabove,

the nets can be identified as NI, N2, N3, N4, A, B, Ci, So and Co, the inputnets A and

B experience a fan-outofthree each, the input net Ci experiences a fan-out oftwo and

all internal nets experience a fanout of one. The output nets So and Co will experience fan-outs determined

by

the full adder's placein alarger design. Aswith functionto com

ponentmapping, a synthesistoolwill alsoconsider these

loading

values when selectinga

particularimplementation. It is possiblethat a fasterorsmallerimplementation mayvio

late a

loading

constraint which would thenrequire

buffer(s)

toremedy. This solutionmay

increase a signal's

latency (having

to passthrough the additional

buffer(s))

thereby

possi

bly decreasing

the maximumoperating speed.

Selecting

an implementationwhich is less

efficientbut acceptable intermsofsatisfying

loading

constraintsmaybeabettersolution.

Since noadditional buffers arerequired, signal

latency

may bereducedandthemaximum

operatingspeed may be higherthanthemore efficient implementation. Thesynthesistool

takes thisintoaccount when

determining

whichparticular components are usedinfunction

mapping.

Thecollectionofcomponentsand interconnections isknownas adesign's netlist, this

is unique to each design and subsequently defines its structure and behavior.

However,

differentnetlistsmay definethesamebehaviorthroughadifferentstructure, this isthefun

damentalprincipleofoptimization,discussed below.

Having

translatedthe one-bit adderinto a combinatoriallogic

function,

the EDAtool

canthenexamine the design in orderto optimize itwith pre-defined and veryefficient li

brary

implementations. Although a one-bit fulladder canbe enteredinthis

fashion,

most
(31)

highly

optimizedimplementationsthatrun much fasterthan thestraightforwarddefinition.

3.2.2

Structural

Description

Having

the logicaldescriptionwhich describesthe design purelyas afunction ofinputto

output

behavior,

an EDAtoolcanthenperformsynthesistogenerate alistof components

and connections which implements the desired logical behavior known as the structural

description. Fromtheone-bit adder exampleabove,using fundamentalgatesthebehavior

wouldthen be implementedusingtwo each of

AND,

ORand exclusive-OR components.

Inreality, thiswould not bethe case, to makefulluseofEDA'sabilities, one wouldallow

the CAD tool decide how to implement the full adder's logical function. The resulting

implementation wouldthen notbeour explicitdefinition inVHDLbut something defined

inatarget

library

givenforthefinal

technology

implementation.

Commonly,

a

technology

vendor will supply basic libraries for their products along with premium libraries which

mayperformbetterthan thebasiclibraries. Thebasic

library

wouldbe freeand a

licensing

fee wouldbe paid for thepremium offering; the developerwouldthen have a

jump

start

on development

having

components which are already optimized for the particular final

technology.

A designcanbeimplemented in many differentways which all producethesame

log

ical

behavior, however,

oneimplementationmaybesuperior over anotherdue torequiring

lesscomponents orrunningat afasterclock speed. AswithmanyofthestepsintheICde

sign

flow,

computationalefforttoimplementationoptimizationtrade-offshavetobemade,

as a function ofprocessingtimeversus component count andoperating frequency. A de

sign will

typically

havea number of

timing

constraints asafunctionofthenumber ofinputs

and outputs. These constraintsgenerallyinvolvethe processingand production of signals

with respecttothe system clock(which mayrequire a minimum

frequency

as aconstraint)

or other signals.

Knowing

the intended functions and organization ofthe structural de

scription, thestatic

timing

of all paths canbe calculated and comparedagainsttherequired

constraints. Optimizationthenproceedstosatisfyanyconstraintswhichhave beenviolated

(32)

by

the structural implementation. Once all constraints have been satisfied nofurther opti mization is requiredthe design flowcan thenproceed toapplying the structural design to thephysicalimplementation.

3.3

Library

Exchange

Format/Design

Exchange

Format

The

Library

Exchange and Design Exchange Formats

(LEF/DEF)

are ASCII text files

which are capableof

describing

a

library

ofcomponents withthe

technology

inwhich

they

are implementedand a specific

design,

respectively.

OpenEDA,

sponsored

by

the Silicon Integration Initiative

(SI2),

maintainstheLEF/DEFformatsas acommunityproject mean

ing

thatanyone isableto requestalicensetothe standards references and sources such as inthiswork[25]. SI2isan organization ofelectronics,EDAand semiconductor

technology

vendors committed to reducing cost and

increasing

productivitywithin integrated silicon systems. Thepurpose oftheLEF/DEF projectisto createan openstandard formatwhich

technologies, librariesandindividual designs canbeexchangedbetweenorganizations us

ing

toolsetsfrom differentvendorswith notranslationissues. Most EDAtechnologiesuse the GDSII

[10]

formattorepresentthe physicaldesignand proprietaryfile formatstorep resent

library

andnetlistinformation. The GDSIIformat is usedto transmit adesignto a fabrication

facility

suchthatitcanbeconstructedinthe targetphysical technology; GDSII

is

being

replaced

by

theOASIS

[34]

standard which offershigher

density

and64-bitvalues. LEF/DEF seekstogive an openoptionto these

binary,

proprietaryformats [25].

3.3.1

LEF Syntax

(33)

technology,

viastointerconnect layersand components whichdescribealibrary.

As stated above, the LEF file describes both the components and the

technology

in

which a

library

is

implemented,

to this end, the LEF portionof adesign may thenbe bro ken intotwoseparatefiles. Ifthisisthe case, the

technology

portion oftheLEFdescription

must be read first in ordertounderstand howthe components are constructed and if

they

violate any design rules. Itis an optionto combine both

technology

and componentde

scriptions,

however,

just as iftwo separate files are used, the

technology

section must be defined first. Thisallowsforareductionofredundantdataasonefilecanbeusedtocontain

the

technology

descriptionwhichmultiple

library

filesmayreference.

Figure 3.5 givestheLEF descriptionofthe

library

usedtoimplementtheone-bit adder

given in Figure 3.3. As shown, only one type ofLEF statement is used to describe the components, the MACROstatement. Thisstatementhas manysub-statements

defining

all properties ofthe componentsuch as port

locations,

construction, electrical

behavior,

etc. The sub-statement which is usedhere is SIZE whichdefines theminimum

bounding

box

which completelycovers all elements ofthe component. Figure 3.4 givesthe syntax of a LEF MACRO statement containingonly a SIZE sub-statement. Values ofinterest lie in

MACROmacroname ; SIZE widthBYheight ; END macroname ;

Figure3.4: LEF Syntaxfor

SIZE-only

MACRO Statement

MACRO xor.2

MACRO and_2

MACRO or_2

SIZE 1000 BY 1000 ; END xor_2

SIZE 1000 BY 1000 ; END and_2

SIZE 1000 BY 1000 ; END or_2

Figure 3.5: Example LEF Definitionof aOne-BitAdder'sComponents

the width and heightvalues ofthe SIZEstatement which definesthe physical box which

must be placed ontothe semiconductordie. As not all components are perfectrectangles thereare constructstodefinewhich portions ofthe

bounding

boxare notobstructedbut for

the purposesofthis work, these valuesare ignored. Itis adesignrule violationtooverlap

anyportions oftwo components,

doing

so will result in adesign which willbe unableto [image:33.542.115.421.474.514.2]
(34)

correctly function. Topreventthis,these

bounding

boxesare used as theboundarieswhich

define ifanoverlap existsbetweentwocomponentsas showninFigure 3.6.

16units

Component

1

Component

2

Component 3

Component

4

Figure 3.6:

Bounding

Box Wirelength Estimationand

Overlap Penalty

3.3.2

DEF Syntax

As with the case ofthe LEF

format,

it is not necessary todiscuss the DEF format in its

entirety. The DEF formatuses the information given in the LEF

file(s)

as its referenceto

defineaspecific design interms ofinstancesof

library

components and interconnections.

The DEF format hasthe provisionstodefine all aspects relatedto the component level of

aphysical design such as instances ofcomponents, their position, orientation, the size of

thetarget

die,

all connections betweenall instances inthe

design,

etc., andrelates closest

to the focus ofthisthesis.

Specifically,

thiswork is

looking

atindividual

instances,

orcomponents, and theirre

lationship

toall other componentstowhich

they

are connected,knownviathe netlist. The

sample syntax ofthe DEF format is givenbelow in Figure

3.7,

the format clearly shows

instantiations of components andprovisions to declaretheir positions on the die with re

spect to their

"Southwest",

orbottom left corner. The DEF file is both the primary input [image:34.542.77.395.131.310.2]
(35)

locked

by

thedesigner. Withthis,theDEFfileservesas a guidefilesuchthat theplacement

algorithm is not forcedto organize large subsections of adesign such asmemory

blocks,

arithmeticunits, etc. Also clearlyshown isthe netlistgiven

by

each net's name, thecom ponent members and the respective ports connectedto the net.

Having

the essentials to

physically define a

design,

instancesofcomponents, their positions and

interconnections,

one canthenproceedto

translating

alogical designgiven

by

EDAtools toa physicaldevice

using PDAtools. The example circuit, theone-bit adder, usedthroughout thisdocument D E S I GNdesignJiame ;

TECHNOLOGY techno

logy

_name ;

DIEAREA ( dieJSW'.corner.coordinates

jc.y ) ( dieJVE.corner.coordinates^c.y ) ;

COMPONENTS numcomps ;

instancejiame

library

.component + FIXED

SWjc.y

j

PLACED

SWsK.y

UNPLACED ;

END COMPONENTS

NETS numnets ;

-netname ( component1port ) ( component2 port ) ... ;

END NETS

END DESIGN

Figure3.7: DEF Syntax for Simple Net

listing

is represented in a DEF file in Figure

3.8,

below. One is able to see that the design uses

two instances each ofthe three

library

components and that each instance has no initial

placementposition. Also givenare the dimensions ofthe target diewhich has an area of

36 million square units; the area required

by

the design is only six million square units (one million square units per six components). The netlist is clearly shown with all nets

having

more than one connection (the output nets have only one connection and are not

pertinentinthisexample)andtheirmembercomponents.

Normally,

forafullphysicalde scription, the input/output

(I/O)

pads ofthe targetdiewould alsobe defined andplaced as

fixedcomponents either

by

thedesignerorthePDA tool. Theinput andoutputnetswould

thenincludethese I/Ocomponents whichwould thenbe consideredpart ofthe placement

problem.

(36)

DESIGN onebitadder

TECHNOLOGY tsmc035

DIEAREA -3000 -3000 3000 3000 )

COMPONENTS 6 ;

-xorl xor_2 + UNPLACED

-xor2 xor_2 + UNPLACED

-andl and_2 + UNPLACED

-and2 and.2 + UNPLACED

-orl or_2 + UNPLACED ;

-or2 or.2 + UNPLACED ;

END COMPONENTS

NETS 7 ;

-A ( xorl A ) ( andl A ) ( orl A

- B

( xorl B ) ( andl B ) ( orl B

- Ci

( xor2 B ) ( and2 B ) ;

-NI ( xorl Y

)

( xor2 A ) ;

-N2 ( andl Y ) ( or2 A ) ;

-N3 ( orl Y ) ( and2 A ) ;

-N4 ( and2 Y ) ( or2 B ) ;

END NETS

[image:36.542.66.516.54.485.2]

END ]DESIGN

Figure 3.8: Example DEF Definitionof aOne-BitFull Adder

3.4

Physical Design Automation

Up

to thispointinthedevelopment flowthedesign itself has beentreatedas alogicalentity,

oneperformingsomefunctionwithoutputs affected

by

itsstate and inputs. Thishas been

broken intosubroutines and assignedtoelementsconsistingofcombinations ofbasic logic

gates. These have been instantiated incomponents which

by

themselves donot comprise

a complete

design, however,

as a

hierarchy

which has an organization a full description

is achieved. Thus the physical instantiation ofthe design

begins,

having

the

library

of

componentsandthespecificsofthesemiconductor

technology

fromtheLEF

file(s)

andthe
(37)

the full listofinterconnections from the DEF

file,

the design can betreated as atangible item. With this comesthe physical manipulationofthecomponents and interconnections

inordertoproducea usabledesign.

3.4.1

Placement

As placement will be covered in much more detail

later,

only the basic operation is de

scribedhere. Figure 3.9 givestheinitial placementoftheexample carriedthroughout this

document,

the one-bit adder.

Clearly

shown arethe six componentscomprisingthe exam

ple design. Inthis example, in order to reduce the interconnection length ofthis design

it is necessarytoplace thecomponentsas close togetheras possible. Ingeneralthere are

other considerations that may make ultra-compact placements undesirable (for example, power

density

orwiring congestion), here for simplicity andclarity,

they

are not consid

ered. Figure 3.10 givesthedesign after placementhasoccurred,

by

comparingto the

pre-andl and2

xorl or2

Figure 3.9: One-bit Adder InitialRandomPlacement

vious placementonecanseethat thecomponents are placednearoptimallywith respectto

interconnectwire

length;

the software Simulated

Annealing

algorithm(presented in

Chap

ter

6)

could notfindthe bestpositions forcomponents xor2 and and2. Anothernotable

feature is that there is no overlap between any ofthe components, a strong requirement

for adesirableplacement. Ifone wereto ignoreoverlap

during

placement, the algorithm
(38)

would undoubtedly find thenaive solution inwhich all components are placed on

top

of

eachother. Onereasonthat theoptimal placementwasnotdiscovered isthat thedieareais

muchlargerthan thedesignrequires, unnecessarily

increasing

thesearch space.

In this example with a die size of 8000

by

8000 units and each component 1000

by

1000units insize,Equation4.3 canbeusedtofindthesize ofthe search space(placement

pointstaken to the power ofthenumber of components). Giventhere are six components

which mustbeplaced, the total number of solutionsis 1.38 x 1046.

Reducing

thediewidth

andheight

by

halfbringsthesearchspacedownto5.3 x IO41possible solutions while still

allowingroomfortheentiredesignwithoutoverlap.

Clearly,

enormous search spaces exist

foreventhesimplest of placement problems.

Asthe placement algorithm performs an actionthatis recorded, the DEFfile is

mod-xorl orl andl

or2

xor2 and2

Figure 3.10: One-bit Adder Final Placement

ified to reflectthe newpositions ofthecomponents onthe die. The

following

component

statementsnowreplace thestatements intheinitial filegiven inFigure 3.8.

Here,

the

coor

-xorl xor_2 + PLACED ( 0 1000 ) ;

-xor2 xor_2 + PLACED ( -50 -50 ) ;

-andl and_2 + PLACED ( 2000 1000 ) ;

-and2 and_2 + PLACED ( 1000 -50 ) ;

-orl or.2 + PLACED ( 1000 1000 ) ;

-or2 or_2 + PLACED ( 2 000 0 ) ;

Figure 3.1 1: Example DEF Definitionof aOne-Bit Full Adder

(39)

keyword PLACEDwhich indicates it has been

intentionally

placedinthatlocation butmay

bemoved

by

handor algorithm. Aftertheplacementstepthe design's layout isthencom

plete with respecttoall

components'

positions, theinterconnectionsmust nowberouted.

3.4.2

Routing

Routing

isthePDA step thatgeneratesthephysical interconnectionsbetweencomponents

given

by

the netlistintheDEFfileandtheplacementfromthelaststep,respectively. Rout

ing,

likeallEDA/PDAsteps,has manychallengeswhichmustbeovercomeor mitigatedin

ordertoproduce aproperly

functioning

device. Someconsiderationswhich mustbe taken

intoaccount arewiringcongestion,wirecapacitance/inductancecoupling,antennaeffects,

etc. It is

likely

that some ofthese considerationsmaybeunresolvable or unacceptable in

therouting step resulting inanotherroundof placementinordertoremovetheproblem. In

this case, most ofthedesignwillremainfixed andonlytheproblem area will bemodified.

Theplacement androutingstepswill theniterateuntil a routable placementis generated.

Aftertherouting stepiscompletetheroutingtoolupdatestheDEFfilewith information

indicating

thephysicallayoutoftherouting. Thisincludesspecifyinglayersused, locations

of vias andtheshape ofthe wires

implementing

thenetlist. With this,thephysical design

processhascometoa point wherethesystemis abletobefabricatedinthesemiconductor

technology

which for itwas originallytargeted. This israrely thefinal stepas furtherver

ification and

testing

isperformedtoensurethe steps ofphysically creating thedesign has

notintroducederrorswith respectto thedevice'selectrical properties.

An

interesting

consequenceof

increasing

transistorspeedsisthatwire(signaltransmis

sion) delays have begunto become largerthan logic delays. This givesthat the insertion

ofbuffers may result in faster circuits

[7]

in certain situations. This is important to the

EDA/PDA community in that tools will have to take this into consideration

during

syn

thesis,placement, and routing. Ifa

long

interconnection absolutelyhastobe constructed,

its

delay

may possibly be mitigated through the insertion of abuffer. The design would

then haveto be analyzed with this buffer in placeto determine ifthere isa net benefitto

(40)

its presence. If

included,

this additional component wouldthen have to be inserted into

the

design,

possibly afterthefirst placement attempt. Iffurther placement attemptselim

inate the

long interconnect,

this buffer maythenpose a performance reduction givingthat

it should be removed.

Furthermore,

placement ofthe buffer with respect to the location

ontheinterconnect determines itseffectiveness, a placementtoolwould thenhaveto take

this into consideration. As logic delays continue to become less than wire

delays,

the

EDA/PDA community willhave to integratethe management of performance

increasing

buffers accordingly.

3.4.3

Back Annotation

This step involvesextractingall electrical characteristics ofthefinal translated,placedand

routed design in order to allow for very precise simulation. Previous simulations could

only approximate the electrical properties ofthe entire design not

knowing

the physical

geometries ofthe device.

Now,

having

the device and it'sphysical

layout,

the simulator

can account for previously unknown factors which may affect the performance or even

the correctness ofthe design inorderto ensurethat the final product when packaged and

inserted into a circuit will function as originally envisioned

during

the first steps ofthe

IC design flow. If any problems are found here the design may be sent backto previous

steps, discussedabove, to enter modifications which will

hopefully

correcttheseproblems

without

introducing

others. Thefinal design istranslated intoa formatwhich afabricator
(41)

Chapter 4

Placement

Placementinvolves arranging components inside ofthe floorplan ofthe die orFPGA tar

geted forthe constructionofthe integratedcircuit

implementing

theoriginal design. This

can bethoughtof as analogoustoarranging an office'sfloorplan (target

die)

withdifferent

size offices

(components)

in order to set up the floor in the most efficient way possible.

Efficiency

hasmultiple variables each ofwhich must beconsidered when

determining

an

acceptable arrangement.

Making

surethat the employees ofthe office are situated close

to others which whom

they

will have the most contact with is a definite goal,

however,

making sure the office holds as many employees as possible should also be considered.

Allowing

room for walkways and other essential constructs is a restriction on both the

numberof workspaces which canbe includedandtheway

they

canbeplaced onthefloor.

Thesecomparisons are

directly

applicable toplacing an integratedcircuit; boththe prox

imity

of

interacting

components andthe

density

oftheirplacement areofhighimportance.

Additionally,

allowingroomforstructures such as wires must alsobeconsidered while ar

rangingthe components,ifplacedtoo

tightly

thesewillhaveno room orthesemiconductor

mayoverheatthuscripplingthe circuit muchinthesameway an office with no walkways

orwhichhasemployees packed against each othercould notfunction.

Determining

aperfectplacement solutionis anNP-completeproblem

[30]

givingthat

forany designof considerable size

finding

acomplete solution requiresoverwhelmingcom

putational effort. Foradesignwith n components, thereexists a search space ontheorder

(42)

thenumber of placement grid pointstakento thepower n[30]. Withtime to market, devel

opmenttooland other administrativeconcerns, simply allowinga machinetoexhaustively

operateontrialplacements in ordertofindan optimal solutionover a matter of months or

investing

in cutting edgehardwareto reducethe computetime to daysarenot acceptable

methods ofCADassistance. Moreelegant methods ofplacementproviding verygoodbut

not perfect solutions allow fora reductionin computing effort; the cost of a non-optimal

solutionis deemedacceptablefor itspayoffinreducedplacementtime.

The act ofplacingcomponents onto a die is an example of a combinatorial optimiza

tionproblem

[36];

thegeneralformulation is introduced here. Placement isa minimization

problemmeaningthatone placement whichhas lowercost thananotheris moredesirable

and aplacement whichhasthe leastcost isconsideredastheoptimum. Combinatorial op

timizationmayalso exist as maximizationproblemsinwhichthehighestcostisconsidered

as the optimum. A specific problem

instance,

or a unique component set andnetlist, can

be formalizedas a pair

(S,f)

where Srepresentsthefinite set of all possible solutions, or

placements, and

/

givesthe cost function

by

which individual solutions in setS maybe

comparedtoone another. Thecost functionmappingis defined as

f-.s^yt

(4.i)

whichistosaythecostfunctionproduces a real valueforindividualsolutionsfroma given

set. As stated above, placement is a minimization problem; this gives that a solution is

soughtthatsatisfies

f{iopt)<f{i),VieS

(4.2)

A maximization problem uses the formula given above with the

inequality

reversed and

both minimizationand maximizationproblems use the termoptimal to representthe best

solution in set

S,

iopt or set ofbest solutions Sopt. The globally optimum solution

i^

is either aminimal or maximal solution

depending

on theproblem type, minimization or

maximization,respectively [1].

(43)

than another

by

evaluating saidparameter(s) in awaytomeaningfully representthe solu

tionwith respecttoothers. Cost functionsareusuallyuniqueto theproblemsinwhich

they

are

implemented,

generally makingthecomparisonoftwo differentoptimization problems

usingthesame costfunctionmeaninglessifnotimpossible. Costs canincludeanynumber

ofparameterswhich are pertinentto

finding

an optimalsolution. Asthisdeterminesabest

solution, implementation ofthecostfunction is avery importantportionofthe combina

torial optimization problem.

Along

withactually

determining

theoptimal solutionusinga

cost

function,

finding

thefunction'srepresentationwhich willgive a specific solution from

thesearch space as an optimal cost isalso anNP-complete problem.

However,

analysis of

the problem athand willprovide verygood guidelines as howtodefine the cost

function,

i.e.,

minimizingwirelength forplacement.

4.1

Exhaustive

Search

The mosttrivialmethodthatcan beusedtofind thebestplacement solution isan exhaus

tive search

involving

computingthecost of each individualplacement inside ofthesearch

space and selecting the optimal solution(s), as stated above, an NP-complete operation.

Using

the naiveapproach, the quantityof placement solutions isproportional to thenum

berofplacementpoints onthe targetdietaken to thepowerofthenumber of components.

Specifically,

thenumberof solutions isgiven

by

i

\\{Xpointsn

x

Ypointsn)

(4.3)

71=0

whereiis thenumberof components andXpointsand Ypoints are thenumber of pointsin

which component n maybeplacedinthex-axisandj^-axis,respectively. Asthismethodis

very inefficientforlarger

designs,

itisrarelyused. Fortrivialdesigns (tensofcomponents),

this method will provide the best output for computing effort

invested,

guaranteeing the optimum solution. Ingeneral, thismethodisnot recommended.
(44)

4.2

Generalized

Hill Climbing/Local

Search

This method is a modification ofthe exhaustive search inwhich an intermediate solution

is keptonlyif its respectivecost is more desirable than the one fromwhichitwas gener ated. Though thismethod is aderivativeof exhaustive search, it has striking resemblance toSimulated

Annealing

[1]. Theconceptof aneighborhoodstructureis introduced hereto facilitategeneration of new solutionstobecompared againsttheoriginal.

Aneighborhood isthe setofsolutions created

by

moving awayfromthe current solu tion

by

one "step". A step is definedinthesamemannerasthecost

function,

dependenton which parameters are pertinentto the problem and usually independent from other prob

lems. Foreach solutioni Sa set

St

C Sofsolutionsthatare closetoi

by

onestep,

St

is

knownastheneighborhood ofi andanysolution

j

G

S,

isknownas aneighboringsolution

to i.

The algorithmusually begins

by

generatinga randomsolution, computingits cost and

generatinga neighborhoodfromtheinitial solution. Theneighborhoodisthensearchedfor abetter solution as compared to thatwhichthe search neighborhoodwas generatedfrom.

Ifabettersolution is

found,

anewneighborhood isgenerated fromthisandtheprocessis

repeated. Theprocesscontinuestoiterateuntil a neighborhoodisgeneratedwhichcontains noneighboringsolutions with abettercostthan the prior solution.

Here,

the algorithm only moves along the hill ifthegiven path will take it to a more

desirable solution

-lowercost inthis case. Themajordownfallsofthismethod arethatit is

highly

dependentontheinitialplacementandthatitis

highly

susceptibleto

being

caught in localminima.

Being

purely greedy, the algorithm will consider a local minimumto be thebestsolution whenmore desirablesolutionsmayexist which areonlyreachedthrough first

increasing

the solution's cost. Figure 4.1 displaysthis graphically. A local minimum

solutioni isdefinedas

(45)

Global Minimum

[image:45.542.63.382.58.251.2]

ProgressionofNeighboringSolutions

Figure 4.1: Seriesof

Neighboring

Solutions

Containing

aLocal Minimum

ori isa solution whichhasalowercostthananyother solutioninitsneighborhood.

It is very possible that a poor intermediate solution will be selected such that better

solutions will neverbe encountered, this can be avoided

by

starting the algorithm witha

largenumber ofinitial solutions. Asmore initial solutions areused, theprobability thata

global optimum willbe foundasymptoticallyapproaches unity [1], Giventhis, generation

and search ofinitial solutions is easilyparallelizableifindependentneighborhoods canbe

guaranteed,

i.e.,

duplicate search efforts canbeeliminated.

Mixing

thismethod with other

methods, possibly as a final greedy step, may yield betterresults more efficiently than if

usedalone [36].

4.3

Min-Cut

The min-cut method is a recursive partitioningmethod which uses the principle that ifa

floorplan iscut in

half,

the fewerwiresthat it cuts, the more efficientthe placement. The

min-cut operation is performed on the sub-levels ofthefirst cut and so on until only one

component is left atthe lowest level. There are several problems with this method in its

purestform

including

loss ofinformation fromone levelto the next;

however,

techniques
(46)

canbe applied allowingthe algorithm's efficiencyto be increased. With respecttoplace

mentthismethodis best usedtoquicklyconvergeon a solution used either as afinalresult

or as astartingpointforanotheralgorithm

[29]

[36].

4.4

Genetic

Algorithms

Thesealgorithms take theirform fromnature and evolution, thatis to saya population is

formed,

breeding

occurs and the members who are most fit fortheir purpose survive to

pass goodtraits ofthe species along tothe future. In itsapplication to computing

Figure

Figure 3.1: Simple IC Design Flow
Figure 3.3: One-bit Adder Logical Schematic
Figure 3.4: LEF Syntax for SIZE-only MACRO Statement
Figure 3.6: Bounding Box Wirelength Estimation and Overlap Penalty
+7

References

Related documents