Modeling of a hardware VLSI placement system: Accelerating the Simulated Annealing algorithm

(1)

Rochester Institute of Technology

RIT Scholar Works

Theses

Thesis/Dissertation Collections

7-2005

Modeling of a hardware VLSI placement system:

Accelerating the Simulated Annealing algorithm

William Merle Batts Jr.

Follow this and additional works at:

http://scholarworks.rit.edu/theses

This Thesis is brought to you for free and open access by the Thesis/Dissertation Collections at RIT Scholar Works. It has been accepted for inclusion in Theses by an authorized administrator of RIT Scholar Works. For more information, please [email protected].

Recommended Citation

(2)

Modeling of a Hardware VLSI Placement System:

Accelerating the Simulated Annealing Algorithm

by

William Merle Batts Jf.

A Thesis Submitted in Partial Fulfillment of the Requirements for the Degree of

Master of Science in Computer Engineering

Supervised by

Visiting Assistant Professor of Computer Engineering Dr. Marcin Lukowiak

Department of Computer Engineering

Kate Gleason College of Engineering

Rochester Institute of Technology

Rochester

,

New York

July 2005

Approved By:

Marcin tukowiak

Dr. Marcin Lukowiak

Visiting Assistant Professor of Computer Engineering

Primary Adviser

Stanisfaw Radziszowski

Dr

.

Stanislaw Radziszowski

Professor of Computer Science

Greg P. Semeraro

Dr

.

Greg Semeraro

(3)

Thesis Release Permission Form

Rochester Institute of Technology

Kate Gleason College of Engineering

Title: Modeling of a Hardware VLSI Placement System: Accelerating

the Simulated Annealing Algorithm

I, William Merle Batts Jr.,

hereby grant permission to the Wallace Memorial Library to

reproduce my thesis in whole or part.

William Merle Batts Jr.

7

c9f

-

!)5

(4)

Dedication

Forall of_{my friends} and

family,

_especially

Denise,

whose_unwavering supportmade this

(5)

Acknowledgments

I'dliketo thank_myadviserDr. Marcin Lukowiakfor his_guidance, wisdom and_unending

patience. Fd also liketo thank_my committee members Dr. Stanislaw Radziszowskiand

Dr.

Greg

Semeraro,

whoseinputandtimegivento thisworkis_greatlyappreciated.

(6)

Abstract

An essential _{step in} the automation ofelectronic design is the placement ofthe physical

componentsonthetarget semiconductordie. Theplacement _steppresents the_opportunity

toreducecostsintermsofwirelengthandperformance

degradation;

howeverit iscompute intensive andis NP-complete interms of_obtaining an optimal solution. As designs have

grown _{in complexity} and gate _count, _obtaining an optimal solution is not feasible due to time to market constraintsor sheer compute effort required. Heuristicalgorithms allowfor

efficient but sub-optimal designs to be produced with a reduction in processing time. A widelyused algorithmisSimulated

Annealing

(SA).

The goal ofthis work was to

develop

a modelthat would enable an analysis into the

feasibility

of

developing

a hardware accelerated placement system which uses SA at its

core. The SA heuristicwas analyzed for possible improvements in _efficiency with focus given to

targeting

the system forhardware. A solution

implementing

parallel _computing with specialized hardware configurations inside a field programmable gate _array

(FPGA)

was investigatedas

having

the _possibilityto improvethe _efficiencyofthe SA-basedalgo rithm. All supportingsubsystems were alsodescribed forahardware accelerated model.

A_{large speedup}was _analyticallyshown from both acceleratingthecritical path ofthe SA algorithm aswell as novel methods of

improving

SA'sefficiency. Asdata throughput

(7)

Contents

Dedication iii

Acknowledgments iv

Abstract v

Glossary

xii

1 Introduction 1

2 Motivation 3

3 Background 5

3.1 ICDesignFlow 5

3.1.1 Concept - Research& Development 6

3.1.2 Concept- High Level Design 6

3.1.3 Design

Entry

7

3.1.4 Synthesis 7

3.1.5 Placement &

Routing

8

3.1.6 Physical Verification & Simulation 9

3.1.7 Fabrication 10

3.2 ElectronicDesign Automation 10

3.2.1 Logical Description 11

3.2.2 Structural Description 15

3.3

Library

Exchange Format/DesignExchangeFormat 16

3.3.1 LEFSyntax 16

3.3.2 DEFSyntax 18

3.4 PhysicalDesign Automation 20

3.4.1 Placement 21

3.4.2

Routing

23

(8)

3.4.3 Back Annotation 24

4 Placement 25

4.1 Exhaustive Search 27

4.2 GeneralizedHill Climbing/Local Search 28

4.3 Min-Cut 29

4.4 Genetic Algorithms 30

4.5 Tabu Search 30

5 Simulated

Annealing

32

5.1 Physical Model 32

5.2 ApplicationtoCombinatorial Optimization 33

5.3 SA& Placement 33

5.4 SA Research 37

5.4.1 Parallel SA 38

5.4.2 HardwareAssisted SA 39

5.4.3

Greedy

Mixed Perturbations 40

5.4.4 Multiple HeuristicCombination 41

6 SoftwareImplementation 43

6.1 Software RevisionsandLessons Learned 43

6.1.1 Experimental Code 43

6.1.2 Initial LEF/DEF Placements 45

6.1.3 Benchmarks 46

6.2 Logical Modules 46

6.2.1 LEF/DEF Reader & Writer 47

6.2.2 Design Perturbation 48

6.2.3 Wirelength Estimator 49

6.2.4

Overlap

Calculation 50

6.2.5 MoveAcceptance 51

6.2.6 Design Update 51

6.2.7 Temperature Update 52

6.3 Temperature Schedule 52

6.4 BenchmarkCircuits 54

6.5 Characterization & Optimization 55

(9)

7 HardwareModel 61

7.1 Wire Length Estimator 61

7.2

Overlap

Detector 63

7.3 MoveAcceptance Logic 65

7.4 Interfaces 66

7.4.1 DatapathControl 67

7.4.2 Component

Supply

68

7.4.3 Design Update & Algorithm Management 69

8 MethodofInvestigation 70

9 Results 72

9.1 SoftwareImplementation 72

9.2 Hardware Model 74

9.3

Speedup

Justification 74

10 Conclusions 77

10.1 Discussion 77

10.2 Future Work 78

10.2.1 Software Implementation 79

10.2.2 Control Unit 79

10.2.3 Data Management 79

10.3

InClosing

80

Bibliography

81

(10)

List

of

Figures

3.1 Simple IC Design Flow 5

3.2 One-bit Adder Example VHDL Description 12

3.3 One-bit Adder Logical Schematic 12

3.4 LEF Syntax for

SIZE-only

MACRO Statement 17

3.5 Example LEF Definitionof aOne-Bit Adder's Components 17 3.6

Bounding

Box Wirelength Estimationand

Overlap

Penalty

18

3.7 DEF Syntax for Simple Net

listing

19

3.8 Example DEF Definitionof aOne-Bit Full Adder 20

3.9 One-bit Adder Initial Random Placement 21

3.10 One-bit Adder Final Placement 22

3.11 Example DEF Definition of aOne-Bit Full Adder 23

4.1 Seriesof

Neighboring

Solutions

Containing

aLocal Minimum 29

5.1 Molecules' MovementsperTemperature Region 32

5.2 Simulated

Annealing

Placement FlowChart 34

5.3

Bounding

Overlap Penalty

35 5.4 Characteristics

Justifying

FPGA Application 40

6.1 Initial Software Algorithm Organization 44

6.2 Software Module Interaction 47

6.3 Sample Temperature Schedule 53

6.4 SoftwareSA Placement Time Profile 56

6.5 Software SA Placement Costvs. Iteration 58

6.6 InitialandFinal Placements ofBenchmark ibm05 60

7.1

Top

LevelRTLSchematic 62

7.2 WirelengthEstimator RTLSchematic 64

7.3

Overlap

Detector RTL Schematic 65

7.4 Move AcceptanceLogicRTL Schematic 66

(11)

8.1 Test

Method

(12)

List

of

Tables

6.1 Benchmark ibmO 1 - ibm04 Information 55

6.2 Benchmark ibm01-ibm04 Time Profiles (%

total)

55

6.3 Cost Delta Function Time Analysis 57

6.4 Average Cost Delta Function Composite Analysis 57

7.1

Top

Level Hardware Input Interface 63

7.2

Top

Level Hardware OutputInterface 64

9.1 TimeComparisontoaMature Placement Tool 73

9.2 DieGrid PointsperBenchmark 74

9.3 FPGAClock Cycle RequirementperCalculation 75

(13)

Glossary

ASIC Application Specific Integrated Circuit- An IC

whichhas beenconstructedto

perform alimitedrange offunctions_veryefficiently,usuallyfasterthanaGPP. Its

functionality

cannotbechangedafterit has beenmanufactured.

CAD Computer Aided Design

-Designworkmanaged and enchanced

by

theuse of

computertechnology.

component Aunitwhich provides sometypeof

functionality

toahardware designand

may be combined with other units to implementmore complicated behavior. An adder is a common example of a _component, _providing the _ability to add

twovalues withthe_abilitytobecombinedinto a larger

design,

such as a mul tiplier.

D

DEF Design Exchange Format

-An ASCII textbased format which defines a de

sign's specific organization interms ofinstances of components andintercon

nections. Used in conjunction with _{LEF. A community} project organized

by

the Silicon Integration Initiative(SI2).

die Thesemiconductortarget on whichintegratedcircuits are constructed.

(14)

E

EDA ElectronicDesign Automation- A CADtechnique

allowingthedesign flowof electronicdevicestobecomemoremanagable,reducingassociated overhead.

FPGA Field Programmable Gate

Array

-A digital processing device which has the

abilitytobe programmed afterit ismanufactured _allowing its

functionality

to bechanged.

FSM Finite State Machine- Logic

whichconsists of afinite number of states with transitions and outputs defined

by

the current state and _possibly the value of

inputs.

G

GPP General Purpose Processor

-Aprocessor whichimplements

functionality

through

groupsofinstructionsand genericfunctional units ratherthan specializeddata structures.

H

HPWL Half Perimeter Wire Length- A

method of_{approximating}a net's interconnec tionwirelength

by

fully

_{enclosing it}inaminimal

bounding

box and

taking

half

oftheperimeter.

I

IC Integrated Circuit - A

general term for an electronic device which contains active semicondictor switches and_possiblypassive devicessuch as resistors or

(15)

IP Intellectual

Property

- Original

workthatisatitscore,

intangible;

an algorithm

isan example ofthis.

layer The fundamental construct of a semiconductor device. The combination of

different types oflayers are usedto implement functionality.

Normally

metal

layers are used for interconnection while

N+,

P+ and polysilicon are used to

createtransistors.

LEF

Library

Exchange Format- An ASCIItextbasedformat

whichdefinesthespe

cificIC

technology

and components usedtoimplementadesign. A community

project organized

by

theSilicon Integration Initiative (SI2).

LUT

Look-up

Table

-Aconstruct usedto implementlogic functionality. Insteadof

implementing

direct

logic,

theseare programmable memories.

N

net A connectionbetween the ports oftwo ormore components or I/O _pads, can

beenvisionedaswires_connectingtheports of components.

netlist Thecollectionof nets and components whichdescribea specificdesign.

PDA Physical Design Automation

-A C-ADtechnique _allowingthe designflow of

electronicdevicestobecome lesscomplicatedintermsof_{satisfying design}rule

checks,electrical_properties,physicalorganization,etc.

port Thepoint on acomponenttowhich a netconnects.

(16)

R

routing

Theact of_physically

defining

all

nets'

connections onthe targetsemiconductor

die.

Usually

performedwith metallayers.

SA Simulated

Annealing

-Astochasticheuristicwhichuses a_coolingmaterial as

a modelinorderto solvecombinatorial problems.

(17)

Chapter

1

Introduction

Astime to market pressures andintegratedcircuit

(IC)

design complexity

increase,

reduc

tion oftime for _{any step} in the design _{flow may} provide an advantage,

technically

and

economically. An integral _step in the IC design flow is placement in which components

targeted to define a device's

functionality

are

logically

placed on the semiconductor die.

Efficientplacements are desirable in thatoperation is improved

by

_reducing

delays,

para

sitic lossesand ifconsidered

during

placement, other cost factors [30]. The act of

finding

a placement intermsof a global optimum is anNP-completecombinatorial optimization

problem

[36]

_givingthatitisnotfeasibletoapproachtheplacement of alarge designwitha

brute force method [21]. Inordertoreduce computational requirements when_performing

placements heuristics are often employed

[2] [33]

[40];

these algorithms do not produce

optimal results but do produce acceptable solutions

(satisfying

_{design constraints)} while

reducing computing time. Forexample, there are over 200,000 placement solutions for a

nine componentdesign on athree

by

three grid, all which must be evaluated inorder to

find an optimal solution.

Using

a heuristic method one can reduce thenumber of evalua

tions to hundreds

-a cle-ar s-avings of_computing effort. One such heuristic method used

inplacementisSimulated

Annealing (SA)

_which, asitsname

implies,

ismodeled afterthe

cooling ofmetal andthe behavior its molecules exhibit [1]. The focus ofthis work is to

analyze, optimize andacceleratethe Simulated

Annealing

heuristic withrespecttoitsuse

implementing

aplacement algorithm.

This document is organizedas

follows;

Chapter 2 provides a motivationfor this work

(18)

by

defining

itsplaceincurrentIC designflowsand_exploringprior methods of_accelerating

placement heuristics. Chapter3 provides thebackgroundof electronic design automation and IC design flow. Chapter 4 describes in depth and compares algorithms which can be usedforplacement. Chapter 5provides an analyticaldescriptionofthe SimulatedAnneal

ing

algorithm and previousresearch

looking

to improve its speed as a placementheuristic. Chapter 6 describes thesoftware implementationof aplacementtoolusedto characterize

the

timing

and analyze the critical path ofthe Simulated

Annealing

algorithm. Chapter

7 describes the hardware model generated _using observations from the software imple

mentation. Chapter 8 gives the analytical process used to validate the hardware model's increase in performance. Chapter 9 details the results ofthe investigation while Chapter 10 states conclusionsdrawnfromtheinvestigationresults and provides futuredirectionfor

(19)

Chapter 2

Motivation

Itwould be difficult to arguethat _anytechnique or method _allowingfor adecrease in the developmenttime of aproductwouldnotbe desirable. This is_especiallytruefortheelec

tronics

industry

whichhasno foreseeableslowdown in innovationanddevelopment [35].

Many

goods _currently _ship with IC devices providing specialized capabilities which are subject to a process that _{may be} the critical path in the product development cycle. To

allowfurthergrowthinthe_complexityofthesedeviceswithout

imposing

restrictivedevel opmenttime overhead, accelerated methods atthecore ofthedesignflowshouldbesought

[41].

As the gate count increases (in the above mentioned

devices)

so does the associated

developmenttime intermsofcomputationalrequirements. Computeraideddesign

(CAD)

of electronicdevicesand

targeting

themtophysicalfabrication

(commonly

knownasElec tronic Design Automation

(EDA)

and Physical Design Automation

(PDA),

_{respectively)}

hasreducedthis time toa great extentbut improvement isrequired asdesignsgrow. Current

EDAandPDAtools_{automatically satisfy}thevarious requirementsparticularto fabrication

processes while _optimizingthedesignwherepossible (Mentor Graphics Design

Architect,

Mach TA and Calibre

[24];

Synopsis

Galaxy,

DesignWare and

Discovery

[39];

Cadence

Encounter [10]). The placement process is compute intensive

[30] [36]

and represents a

significantamount oftime inthedesign flow.Aswith_anyofthe stepsinthedesign

flow,

it

(20)

this_stepwithin adevelopmentpath. Simulated

Annealing

[2

1]

isa_verycommon algorithm

usedtoimplementa placementtool

[2] [32] [33]

[40],

_acceleratingthisalgorithmtherefore has beenthe center of_manystudies

[11]

[14] [18]

[23].

Priorwork inthisfield has

firstly

focused on

improving

the Simulated

Annealing

algo

rithmthrough_analyzingand_modifyingtheperturbationtypesand costfunctions

[14] [18]

[19]

[28]. Otherworkhas lookedto parallel implementation ofthe Simulated

Annealing

algorithm _{purely in} software to produce a _speedup

[1]

[11]. The serial nature ofthe al gorithm does not

directly

lend itselfto this approach thoughparallel implementationhas

been showntobe successful

[8]

[23]. Otherapproachesof_{speeding up}the Simulated An

nealing algorithmhave focusedonhybrid implementations usingother searchmethods as

an augmentation

[18]

[20]. Hardware implementations have lookedtospecializeddataand

processingstructures designedto be implemented in largematrices of executionelements onFPGAs[16][41].

Analyzing

the above prior _work, it seems a hardware accelerated parallel _processing

approach would have the _ability to provide a substantial speedup.

Taking

direction from

[8]

[11] [23]

and

[41],

thiswork seekstofirstcharacterizethepure software placementtool

using SAasitscoreheuristicto

identify

thecritical pathinthedata flow.

Having

thisinfor mation,atailoreddatapathcanbe developedwhichwould provide some amount of_speedup

overthesoftwaretool.

Considering

hardwareinterfacerequirements

(memory

_access,data structures, etc.) italso seemsfeasiblethattheparallel operationofmultiple identical

datap

athswould producenear-linear speedupstoa point.

Applying

knowledgeofthebehaviorof

theSAalgorithmwhen appliedtoplacement also provides novel approachestoahardware

(21)

Chapter 3

Background

3.1

IC Design Flow

Implementing

an electronicdesign in anintegrated circuit is

by

no means atrivial _task, a

number ofsteps occurbetween theconception oftheidea to the

delivery

ofthepackaged

IC. Eachofthe steps inthedesignprocess are

highly

correlatedwithothers

[30];

the pro

cess _{is usually} not executed _{linearly. Figure 3.1 only}gives an overview ofthe IC design

flow steps. This figure is very simplified and _only gives a general outline ofthe entire

Concept 1

Design_Entry

Synthesis

Placement &_Routing 1

I

Physical Verification& Simulation

Fabrication

i

Review/ Validation/

[image:21.542.66.465.409.611.2]

Testing

Figure 3.1: SimpleIC Design Flow

processaseach _stepcontains several_underlying steps. As_{shown, the}process_mayreverse

(22)

Ofthe

following

_{steps, this thesis} focuses onplacement. Herethe components ofthe

physical designare arrangedinanattemptto producean optimizedlayout. Allother steps

areoutsidethescope ofthisworkbutareimportantin that together

they

definea semicon

ductor design flow.

3.1.1

Concept

-

Research &

Development

The initial phase ofdesign involves the analysis ofthe original

idea,

refinement and re

search. Some initial steps areto determine usefulness, profitability,

feasibility,

whichtar

get

technology

isused, andoverall project goals/requirements.

Any

ofthe steps herewill

impact therest of_{the process,} an example ofthis wouldbe the target technology. Ifhigh

performance and volume isa projectgoal, application specific IC

(ASIC)

design may be

targeted whereas ifcost is a

limiting

factororthe design is to be produced in small vol

umesorforprototyping, lessexpensive standard cell orFPGAtechniques_{may be}utilized.

Each decision here leads the project down different design

flows,

this must be _carefully

considered inthis step.

3.1.2

Concept

-

High Level Design

Here theoverall system architecture is definedand subsystems appear which_may also be

further broken down into smaller components. The high level designistaken fromthe re

sultsoftheprevious_stepand representedin modelingtools. Atthis point,either_proprietary

ortarget

technology

vendor suppliedlibraries _may be utilizedtoreduce duplicate design

effort as subsystems ofthe _{design may be readily} available in these libraries as common

components.

Early

insights to optimizations can be discovered in this _step such as data

(23)

3.1.3

Design

Entry

This _step begins the use ofEDA CAD tools and generates the logical representation of

the design. Two traditional types ofdesign entries are

typically

_used, schematic capture

and text based _{modeling languages} with the latter

being

more popular for large designs.

Schematic capture involves _using a GUI to represent components and connections. The

designercreates adiagram todefinea system inwhichthe tool thencreates an intermedi

ate representationto be passed on to the next step.

Modeling

languages offer_portability

and selfdocumentationwhereinhardwaredesignsarerepresentedthrough theuseofsource

code. _{Two very}commonlanguagesareVHDL

[13]

and

Verilog

HDL

[12],

theseare_widely

used todefine hardwaresystems andtosimulate adesign before

being

mapped toatarget

technology. Others languages such as SystemC

[26]

lookto fill in the design flow gaps

that the two aforementioned languages leave open

by

not _allowing overall systems to be

modeledingreatdetail.

Many

IDEs allow forboththe concurrent use of schematic _entryand a _{modeling lan}

guage in order to leverage the strengths ofboth methods (the high-level design can be

viewed inschematics andthe low-level components can be viewed in HDL).

Using

such

a tool allows for _any changes in one method to update _{the other,} _{maintaining coherency}

across all views.

A designercan findoptimizations inthis_stepthrough intelligent_{construction;} a good

designerwill produceefficient, correct source code. Sometoolsused inthis_step are Men

torGraphics DesignArchitect

[24],

OrCAD PSpice

[10]

andXilinx IDE(IntegratedDesign

Environment) [42],

MentorGraphicsModelSim

[24]

andAldec Active-HDL

[5]

formodel

languageand/orschematic capturedevelopmentand simulation.

3.1.4

Synthesis

The designs entered in the previous _step are now translated into

library

or custom com

(24)

size,shape, connection _ports, and electrical characteristics. Thisisthefirst_step in_placing

thedesignontothe targetsemiconductordie.

Eachcomponenthasportswhichservetomovesignalstoandfroman exterior connec

tionwhile groups of ports _usuallyfromdifferent components _may be connected together

to formnets. Itistheunique combination of components and connections whichgives one

design differentcharacteristicsfromanother.

Thechoice of

technology

used_greatly influencesthisstep. IfanFPGAor standard cell

library

is _{used, the} _{designer may}use atool toconvertthedesign into itsphysical form in

which a componentlist and a netlistwill be generated_representing the project in circuit

form. Iffull customASIC

technology

ischosen,anotherdesignerwill havetocreate com

ponents from the output oftheprevious _step either from scratch or from a generic form

component. Some tools used here are Mentor Graphics' Design Compiler

[24]

and Ca dence's BuildGates [10]. AtthispointtheCADtoolsinuse move fromEDAtoPDAasthe

logicalrepresentation ofthedesigniscomplete, furtherstepsdealwith_applyingthelogical

designto thephysicalprocesschosenintheconcept stages.

3.1.5

Placement

&

Routing

The components from the previous _step are _physically applied to the floor ofthe target

semiconductor die and the physical design begins to take shape. Here the placement of

each component generatedintheprevious _stepwithrespectto _everyothercomponentbe

comesimportant inordertominimize_{wiring delays}and congestion as well astominimize the target die size. Asthe focus ofthis work involves algorithms atthecore ofthis _step,

more about placement willbe expanded uponlater.

Routing (instantiating

theconnections

defined

by

all nets onthe semiconductor

die)

isof obvious concern as again wire lengths

shouldbe optimizedto improve performance. Mostdesigns are _{automatically} placed and

routedfor_anyprojectof appreciable size. Directmanipulation

by

thedesigner issometimes

warranted but this is limited to small areas which require attention.

Efficiently

_routing a

(25)

Placementand_{routing may be}performedas two independent sub-steps or as a single integrated step. Performed as independent sub-steps, routing requirements must be con

sidered inthe placement_stepasto providea routable design. The possibility of

having

to partiallyre-placethedesignexistsasit ispossiblethatsome aspect oftheinitialplacement

will create problems inrouting. The integratedplace and route _{step may possibly} suffer from_unacceptably

long

runtimes as thesearch spacecreated

by

_{combining both} steps is

much largerthaneither _step

by

itself. With current semiconductorprocesses _{offering full}

routing overthe components, where all interconnectionsexist above the transistor

layers,

independenceofthe two stepsbecomesmore reasonable.

3.1.6

Physical

Verification

&

Simulation

Atthisstage aphysical representation oftheprojectis complete andall

timing

andelectri

cal characteristics ofthesystem canbe known allowing foran accurate simulationto take

place. This is known as post-place and route simulation and makes use ofparasitics ex

tractionwhichusesthegeometries of eachtransistor to

fully

_specifya_{very detailed}model. Previoussimulations could not account forthe thesevalues (dueto

being

specifictophys ical _{construction)} and were eitherignoredor estimated. Herethe system canbe measured

toensure thephysical designwill meet criteria setforth inthe_precedingstages.

Now

having

the physical characteristics ofthe _target, post layout verification such as designrule checks and layoutversus schematic canbeperformedtomake surethe layout doesnot violate_{any fabrication}rulesandbehavesasthedesigners

intend,

respectfully.

Any

mistakes here will

likely

sendtheproject back to the place and route stage or_{worse, the}

synthesis stage ifa major fault is discovered. It is possible that the fundamental design would require modification at whichpoint there is no choice but to re-designaround the

problemandre-enterthecorrectioninthelogicalrepresentation. _{This obviously}givesthat care shouldbe taken_up to thispoint toensure correctness. Ifthedesign passes alltests it

(26)

3.1.7

Fabrication

Ifthedesign istargeted toan

ASIC,

the_{physically defined}project iscreatedin a semicon

ductor

foundry

first as whole wafersthen individual dies and

finally

packaged dies _ready

for use. In orderto_efficientlyproduceICsabuilt in selftestcanbe included inthedesign

toallow dies tobe testedbefore packaging. Inthis _{way if}adie fails its selftestitcanbe

discarded beforepackaging,savingtimeand money. IfanFPGAisthetargetofthe

design,

theFPGA isprogrammed_usingthebitstreamgenerated

by

thedesignsuitetargeted toward

theparticular FPGA used.

Typically,

the manufacturerofthe FPGAprovides a software

package to take a design from a concept to the

finished,

programmed FPGA without re

lianceon

third-party

tools,though

third-party

suites existthatreplacethis

functionality

[5].

Thefinishedproductisthenmarketed and sold orincluded inalargerproject

depending

on

itspurpose.

3.2

Electronic Design Automation

Electronic Design Automation is aCAD

technology

aimed at_managing therequirements

of_working with designs targeted to work inside of electronic technologies (custom

IC,

standardcell,FPGA). The termEDA _{is usually}an umbrellatermappliedtoall CAD tech nologies usedtomanagedesigns from ideas to silicon;EDAapplies toallCADtoolsused

beforeapplicationto thephysical process and physicaldesignautomation

(PDA),

discussed

below,

involvesallCAD toolsusedto manageadesignafterthispoint.

EDAis_widelyemployedas a methodtoensurethat a group's_{intellectual property}

(IP)

is _properly utilized

by

_allowing modularizationofdesigns and creation of_proprietary li

braries. Furthermoreand more

importantly,

EDAtoolsallowforreuse of_previouslycreated

IP,

reducing duplicate effort. Thisorganizational functionofEDAtools is notits primary

focus;

EDA tools allow oneto _easily navigate _{overwhelmingly}large designs with relative

ease. Designs have grown in size both insubsystemhierarchies and puretransistor count

(27)

revelation,in 1965 Intelco-founderGordon MoorestatedthatICtransistorcount will dou

ble everytwo years. In 1971 the Intel 4004 hadatransistor count of

2,300,

by

1982with

the Intel286 the counthadrisentoover 130,000. Knownas Moore's

Law,

this prediction

has held trueandisforeseentodoso[35].

EDA's primary purpose is to provide a method of

translating

a design input

descrip

tion to a logic

description, i.e.,

combinations ofbasic logic functions such as

AND, OR,

XOR. As previously mentioned, the input type could be an integrated development en

vironment

(IDE)

based schematic capture where a design is

"drawn"

in terms of visual

elements or atextbasedlanguagesuchasVHDL. The EDApackagewilltake theseinputs

and create generic logicaldescriptionswhich_maybe optimized_using librarieseitherpre

viously created and archived

by

developers or provided

by

the target

technology

supplier.

The strength ofEDA CAD

technology

is

being

ableto _simply_represent, navigate and

test largedesigns. EDA toolsprovidethe intelligence tooptimize designs with_previously

and _specifically developed components. This savesduplicate development time _allowing

a _group to build upon previous work and provides superior implementations of systems

without _{requiring intimate} knowledge ofthe target _technology, optimal logical function

implementationsor excessiveinteractionwiththedesigner.

3.2.1

Logical Description

The logical descriptionrepresentsthefirst_stepinsideanEDA CADtool toward therealiza

tionof adesignas a

fully

functional electronicdevice.

Taking

asa_verysmallexamplethe

generation ofa one-bitfulladder one can showthe translationfrominputtologic

(usually

the full adder is an atomic elementof an electronicdesign but for this example it canbe

decomposed). From

fundamentals,

a one-bitfulladderisgiven as

S0

=

AeBCl

(3.1)

C0

= ({A *

B)

+ ((A+

D)

*

CA

(3.2)

(28)

where

A,

B and

Cl

representthe adder's inputs and

S0

and

C0

represent the adder's sum output and_carry_output,respectively.

Having

its behavior

defined,

the design must be entered _using one ofthe aforemen

tioned _methods; the core VHDL is presented in Figure 3.2 while a schematic capture is given

by

Figure 3.3. The syntaxforthe VHDL statements assigns the logic value ofthe

So <= A xor B xor _Ci;

Co <= ₍ (A and _B) or ( (A or B) and _Ci) ₎ _;

Figure 3.2: One-bit Adder Example VHDL Description

functionontherighthandside ofthe signalassignmentoperator_{(left arrow)}to the output signal onthe left. AVHDLcompiler willthenanalyzethefile_containingthese statements

and assemblealogical descriptionwhich canbevisualized

by

Figure3.3 whichalsowould be theinputof a schematic captureEDAtool.

The EDAtoolsnow

having

thelogicalrepresentation ofthedesigncandeterminehow

A 3 Ci

Figure3.3: One-bitAdderLogical Schematic

the system willact given certain

inputs,

inother_{words, the} design canbe testedand ex

amined for correctbehavior.

Usually,

this isthe firsttime the design istested with actual

inputs and _usuallybegins at the module level suchthat given known

interfaces,

individ

ual designers can

independently

create modulesthatwill produce a_working system after [image:28.542.68.381.370.542.2]

(29)

testbenchis generatedthatcontains testinputswithknownoutputs suchthatfastgo/no-go

tests canbe executed

during

_{development reducing} the _{code/test/debug}cycle's time. For

something as simple astheone-bitadder, anexhaustive testsetwouldbe usedtocheckall

possible combinations ofinputswhereasfor developmentpurposesina muchlarger design

alimitedtest setwouldbeusedforspotchecks

during

development. A

larger,

more com

plete testwould thenbe executedtoensurethat themodule will_correctly function before

integration intothesystem.

Thebasic

building

blockusedto implement a functionwithin a designis known as a

logic_gate;withintheexamplefigurexor,and,and or are allinstancesof gates whichim

plementbasic logic functions. Theelementsinthisexamplecan alsobecalledcomponents,

a structure which implements some amountoffunctionality. Inthis casethe components

implementbasic

functionality,

these_{may implement}morecomplexfunctionsandit is upto

the designerofthe

library

todefinetheirgranularity. _{In synthesizing} an_arbitrary function

it may bemore efficienttoimplementa multiplexer-based

look-up

table

(LUT)

ratherthan

pure logic as shownhere. A synthesis tool will perform forwardand reverse elimination

in order to determine this; it's operating

theory

is outside the scope ofthis thesis. Here

the tool traverses the

boundary

between logicaland structuraldescriptions _{in mapping}the

logic to components. It is

likely

thatunless an optimized

library

component existsforthe

example one-bitadder and

depending

onthelevelofoptimization,itwouldbeinstantiated

in aLUT.

Tobedefinedasa_component, interconnectionsmustbeable tobe madeto/fromother

components. To specifythepoints wheretheseconnectionsare made ports are used. Aport

is,

as its name

implies,

a path which passes fromoutsidethe componentto the functional

elements inside. Aport_mayhavepropertieswhich_specifythedirectionofthe logicaldata

flow inordertoallow anEDAtooltodeterminea component isused correctly. The inter

connections which aremade between ports of components are known as nets.

Typically,

anet_only connects toone port on a componentandbetweenalimitednumber of compo

nents, but may

theoretically

connect_anynumber of nets of_anynumber ofcomponents.

(30)

The electrical properties ofthe target

technology

limit the number of input ports an

outputport can drive. This value is known as fan-out.

Conversely,

there are real-world

limitationsonthenumber of outputs one inputcansupport, this isnot_usuallyencountered

as multiple drivers ofan input are avoided (or at least advised against

during

EDApro

cessing). Thisvalue isknownasfan-inand

is,

_along with

fan-out,

calculatedforeach net

and verified nottoexceed limitsasdefined

by

the target technology. Intheexampleabove,

the nets can be identified as _{NI, N2, N3, N4, A, B, Ci,} So and Co, the inputnets A and

B experience a fan-outofthree each, the input net Ci experiences a fan-out oftwo and

all internal nets experience a fanout of one. The output nets So and Co will experience fan-outs determined

by

the full adder's placein alarger design. Aswith functionto com

ponentmapping, a synthesistoolwill alsoconsider these

loading

values when _selectinga

particularimplementation. It is possiblethat a fasterorsmaller_{implementation may}vio

late a

loading

constraint which would thenrequire

buffer(s)

toremedy. This solution_may

increase a signal's

latency (having

to passthrough the additional

buffer(s))

thereby

possi

bly decreasing

the maximum_operating speed.

Selecting

an implementationwhich is less

efficientbut acceptable intermsof_satisfying

loading

constraints_maybeabettersolution.

Since noadditional buffers arerequired, signal

latency

_{may be}reducedandthemaximum

operatingspeed _{may be higher}thanthemore efficient implementation. Thesynthesistool

takes thisintoaccount when

determining

whichparticular components are usedinfunction

mapping.

Thecollectionofcomponentsand interconnections isknownas adesign's netlist, this

is unique to each design and _subsequently defines its structure and behavior.

However,

differentnetlists_{may define}thesamebehaviorthroughadifferent_{structure, this} isthefun

damentalprincipleofoptimization,discussed below.

Having

translatedthe one-bit adderinto a combinatoriallogic

function,

the EDAtool

canthenexamine the design in orderto optimize itwith pre-defined and _veryefficient li

brary

implementations. Although a one-bit fulladder canbe enteredinthis

fashion,

most

(31)

highly

optimizedimplementationsthatrun much fasterthan thestraightforwarddefinition.

3.2.2

Structural

Description

Having

the logicaldescriptionwhich describesthe _{design purely}as afunction ofinputto

output

behavior,

an EDAtoolcanthenperformsynthesistogenerate alistof components

and connections which implements the desired logical behavior known as the structural

description. Fromtheone-bit adder exampleabove,using fundamentalgatesthebehavior

wouldthen be implemented_usingtwo each of

AND,

ORand exclusive-OR components.

In_{reality, this}would not be_{the case, to} makefulluseofEDA's_abilities, one wouldallow

the CAD tool decide how to implement the full adder's logical function. _{The resulting}

implementation wouldthen notbeour explicitdefinition inVHDL_{but something defined}

inatarget

library

givenforthefinal

technology

implementation.

Commonly,

a

technology

vendor will _supply basic libraries for their products _along with premium libraries which

mayperformbetterthan thebasiclibraries. Thebasic

library

wouldbe freeand a

licensing

fee wouldbe paid for thepremium _{offering; the} developerwouldthen have a

jump

start

on development

having

components which are _already optimized for the particular final

technology.

A designcanbeimplemented in many differentways which all producethesame

log

ical

behavior, however,

oneimplementation_maybesuperior over anotherdue to_requiring

lesscomponents or_runningat afasterclock speed. Aswith_manyofthestepsintheICde

sign

flow,

computationalefforttoimplementationoptimizationtrade-offshavetobe_made,

as a function of_processingtimeversus component count and_operating frequency. A de

sign will

typically

havea number of

timing

constraints asafunctionofthenumber ofinputs

and outputs. These constraints_generallyinvolvethe _processingand production of signals

with respecttothe system clock_{(which may}require a minimum

frequency

as a_constraint)

or other signals.

Knowing

the intended functions and organization ofthe structural de

scription, thestatic

timing

of all paths canbe calculated and comparedagainsttherequired

constraints. Optimizationthenproceedsto_satisfy_anyconstraintswhichhave beenviolated

(32)

by

the structural implementation. Once all constraints have been satisfied nofurther opti mization is requiredthe design flowcan thenproceed to_applying the structural design to thephysicalimplementation.

3.3

Library

Exchange

Format/Design

Exchange

Format

The

Library

Exchange and Design Exchange Formats

(LEF/DEF)

are ASCII text files

which are capableof

describing

a

library

ofcomponents withthe

technology

inwhich

they

are implementedand a specific

design,

respectively.

OpenEDA,

library

andnetlistinformation. The GDSIIformat is usedto transmit adesignto a fabrication

facility

suchthatitcanbeconstructedinthe targetphysical technology; GDSII

is

being

replaced

by

theOASIS

[34]

standard which offershigher

density

and64-bitvalues. LEF/DEF seekstogive an openoptionto these

binary,

_proprietaryformats [25].

3.3.1

LEF Syntax

(33)

technology,

viastointerconnect layersand components whichdescribealibrary.

As stated _above, the LEF file describes both the components and the

technology

in

which a

library

is

implemented,

to this end, the LEF portionof adesign may thenbe bro ken intotwoseparatefiles. Ifthisisthe case, the

technology

portion oftheLEFdescription

must be read first in ordertounderstand howthe components are constructed and if

they

violate _any design rules. Itis an optionto combine both

technology

and componentde

scriptions,

however,

just as iftwo separate files are used, the

technology

section must be defined first. Thisallowsforareductionofredundantdataasonefilecanbeusedtocontain

the

technology

descriptionwhichmultiple

library

files_mayreference.

Figure 3.5 givestheLEF descriptionofthe

library

usedtoimplementtheone-bit adder

given in Figure 3.3. As shown, only one type ofLEF statement is used to describe the components, the MACROstatement. Thisstatementhas manysub-statements

defining

all properties ofthe componentsuch as port

locations,

construction, electrical

behavior,

etc. The sub-statement which is usedhere is SIZE whichdefines theminimum

bounding

box

which _completelycovers all elements ofthe component. Figure 3.4 givesthe syntax of a LEF MACRO statement _containing_only a SIZE sub-statement. Values ofinterest lie in

MACROmacroname _; SIZE widthBYheight ; END macroname _;

Figure3.4: LEF Syntaxfor

SIZE-only

MACRO Statement

MACRO xor.2

MACRO and_2

MACRO or_2

SIZE 1000 BY 1000 ; END xor_2

SIZE 1000 BY 1000 _; END and_2

SIZE 1000 BY 1000 _; END or_2

Figure 3.5: Example LEF Definitionof aOne-BitAdder'sComponents

the width and heightvalues ofthe SIZEstatement which definesthe physical box which

must be placed ontothe semiconductordie. As not all components are perfectrectangles thereare constructstodefinewhich portions ofthe

bounding

boxare notobstructedbut for

the purposesofthis work, these valuesare ignored. Itis adesignrule violationto_overlap

anyportions oftwo components,

doing

so will result in adesign which willbe unableto [image:33.542.115.421.474.514.2]

(34)

correctly function. Toprevent_this,these

bounding

boxesare used as theboundarieswhich

define ifan_overlap existsbetweentwocomponentsas showninFigure 3.6.

16units

Component

1

Component

2

Component 3

Component

4

Figure 3.6:

Bounding

Overlap Penalty

3.3.2

DEF Syntax

As with the case ofthe LEF

format,

it is not _necessary todiscuss the DEF format in its

entirety. The DEF formatuses the information given in the LEF

file(s)

as its referenceto

defineaspecific design interms ofinstancesof

library

components and interconnections.

The DEF format hasthe provisionstodefine all aspects relatedto the component level of

aphysical design such as instances of_{components, their position, orientation, the} size of

thetarget

die,

all connections betweenall instances inthe

design,

etc., andrelates closest

to the focus ofthisthesis.

Specifically,

thiswork is

looking

atindividual

instances,

or_components, and theirre

lationship

toall other componentstowhich

they

are _connected,knownviathe netlist. The

sample syntax ofthe DEF format is givenbelow in Figure

3.7,

the format _clearly shows

instantiations of components andprovisions to declaretheir positions on the die with re

spect to their

"Southwest",

orbottom left corner. The DEF file is both the _{primary input} [image:34.542.77.395.131.310.2]

(35)

locked

by

thedesigner. Withthis,theDEFfileservesas a guidefilesuchthat theplacement

algorithm is not forcedto organize large subsections of adesign such as_memory

blocks,

arithmetic_units, etc. _{Also clearly}shown isthe netlistgiven

by

each net's _{name, the}com ponent members and the respective ports connectedto the net.

Having

the essentials to

physically define a

design,

instancesof_{components, their} positions and

interconnections,

one canthenproceedto

translating

alogical designgiven

by

EDAtools toa physicaldevice

using PDAtools. The example _{circuit, the}one-bit adder, usedthroughout thisdocument D E S I GNdesignJiame ;

TECHNOLOGY techno

logy

_name ;

DIEAREA _{( die}JSW'.corner.coordinates

jc.y ) ( dieJVE.corner.coordinates^c.y ) ;

COMPONENTS numcomps ;

instancejiame

library

.component + FIXED

SWjc.y

j

PLACED

SWsK.y

UNPLACED ;

END COMPONENTS

NETS numnets _;

-netname ₍ component1port ₎ ₍ component2 port ) ... ;

END NETS

END DESIGN

Figure3.7: DEF Syntax for Simple Net

listing

is represented in a DEF file in Figure

3.8,

below. One is able to see that the design uses

two instances each ofthe three

library

components and that each instance has no initial

placementposition. Also givenare the dimensions ofthe target diewhich has an area of

36 million square _{units; the} area required

by

the design is _only six million square units (one million square units per six components). The netlist is clearly shown with all nets

having

more than one connection (the output nets have _only one connection and are not

pertinentinthis_example)andtheirmembercomponents.

Normally,

forafullphysicalde scription, the input/output

(I/O)

pads ofthe targetdiewould alsobe defined andplaced as

fixedcomponents either

by

thedesignerorthePDA tool. Theinput andoutputnetswould

thenincludethese I/Ocomponents whichwould thenbe consideredpart ofthe placement

problem.

(36)

DESIGN onebitadder

TECHNOLOGY tsmc035

DIEAREA -3000 -3000 3000 3000 )

COMPONENTS 6 ;

-xorl xor_2 + UNPLACED

-xor2 xor_2 + UNPLACED

-andl and_2 + UNPLACED

-and2 and.2 + UNPLACED

-orl or_2 + UNPLACED ;

-or2 or.2 + UNPLACED ;

END COMPONENTS

NETS 7 ;

-A ( xorl A ) ( andl A ) ( orl A

- B

( xorl B ₎ ₍ andl B ) ( orl B

- Ci

( xor2 B ₎ ( and2 B ) ;

-NI ( xorl Y

)

( xor2 A ₎ ;

-N2 ( andl Y ₎ ( or2 A ) ;

-N3 ( orl Y ) ( and2 A ) ;

-N4 ₍ and2 Y ) ( or2 B ) ;

END NETS

[image:36.542.66.516.54.485.2]

END ]DESIGN

Figure 3.8: Example DEF Definitionof aOne-BitFull Adder

3.4

Physical Design Automation

Up

to thispointinthedevelopment flowthedesign itself has beentreatedas alogical_entity,

one_performingsomefunctionwithoutputs affected

by

itsstate and inputs. Thishas been

broken intosubroutines and assignedtoelements_consistingofcombinations ofbasic logic

gates. These have been instantiated incomponents which

by

themselves donot comprise

a complete

design, however,

as a

hierarchy

which has an organization a full description

is achieved. Thus the physical instantiation ofthe design

begins,

having

the

library

of

componentsandthespecificsofthesemiconductor

technology

fromtheLEF

file(s)

andthe

(37)

the full listofinterconnections from the DEF

file,

the design can betreated as atangible item. With this comesthe physical manipulationofthecomponents and _{interconnections}

inordertoproducea usabledesign.

3.4.1

Placement

As placement will be covered in much more detail

later,

only the basic operation is de

scribedhere. Figure 3.9 givestheinitial placementoftheexample carriedthroughout this

document,

the one-bit adder.

Clearly

shown arethe six components_comprisingthe exam

ple design. In_{this example,} in order to reduce the interconnection length ofthis design

it _{is necessary}toplace thecomponentsas close togetheras possible. Ingeneralthere are

other considerations that _may make ultra-compact placements undesirable (for _example, power

density

or_wiring _congestion), here for simplicity andclarity,

they

are not consid

ered. Figure 3.10 givesthedesign after placementhasoccurred,

by

comparingto the

pre-andl and2

xorl or2

Figure 3.9: One-bit Adder InitialRandomPlacement

vious placementonecanseethat thecomponents are placednear_optimallywith respectto

interconnectwire

length;

the software Simulated

Annealing

algorithm(presented in

Chap

ter

6)

could notfindthe bestpositions forcomponents xor2 and and2. Anothernotable

feature is that there is no _overlap between _any ofthe _components, a _strong requirement

for adesirableplacement. Ifone wereto ignore_overlap

during

_placement, the algorithm

(38)

would _{undoubtedly find} thenaive solution inwhich all components are placed on

top

of

eachother. Onereasonthat theoptimal placementwasnotdiscovered isthat thedieareais

muchlargerthan thedesignrequires, unnecessarily

increasing

thesearch space.

In this example with a die size of 8000

by

8000 units and each component 1000

by

1000units insize,Equation4.3 canbeusedtofindthesize ofthe search space(placement

pointstaken to the power ofthenumber of components). Giventhere are six components

which mustbe_{placed, the total} number of solutionsis 1.38 x 1046.

Reducing

thediewidth

andheight

by

halfbringsthesearchspacedownto5.3 x IO41possible solutions while still

allowingroomfortheentiredesignwithoutoverlap.

Clearly,

enormous search spaces exist

foreventhesimplest of placement problems.

Asthe placement algorithm performs an actionthatis recorded, the DEFfile is

mod-xorl orl andl

or2

xor2 and2

Figure 3.10: One-bit Adder Final Placement

ified to reflectthe newpositions ofthecomponents onthe die. The

following

component

statementsnowreplace thestatements intheinitial filegiven inFigure 3.8.

Here,

the

coor

-xorl xor_2 + PLACED ₍ 0 1000 ₎ ;

-xor2 xor_2 + PLACED ( -50 -50 ) ;

-andl and_2 + PLACED ( 2000 1000 ₎ ;

-and2 and_2 + PLACED ₍ 1000 -50 ) ;

-orl or.2 + PLACED ( 1000 1000 ) ;

-or2 or_2 + PLACED ( 2 000 0 ) ;

Figure 3.1 1: Example DEF Definitionof aOne-Bit Full Adder

(39)

keyword PLACEDwhich indicates it has been

intentionally

placedinthatlocation but_may

bemoved

by

handor algorithm. Aftertheplacement_stepthe design's layout isthencom

plete with respecttoall

components'

positions, theinterconnectionsmust nowberouted.

3.4.2

Routing

isthe_{PDA step} thatgeneratesthephysical interconnectionsbetweencomponents

given

by

the netlistintheDEFfileandtheplacementfromthelast_step,respectively. Rout

ing,

likeallEDA/PDA_steps,has manychallengeswhichmustbeovercomeor mitigatedin

ordertoproduce a_properly

functioning

device. Someconsiderationswhich mustbe taken

intoaccount are_wiring_congestion,wirecapacitance/inductance_coupling,antenna_effects,

etc. It is

likely

that some ofthese considerations_maybeunresolvable or unacceptable in

the_{routing step resulting in}anotherroundof placementinordertoremovetheproblem. In

this case, most ofthedesignwillremainfixed and_onlytheproblem area will bemodified.

Theplacement and_routingstepswill theniterateuntil a routable placementis generated.

Afterthe_{routing step}iscompletethe_routingtoolupdatestheDEFfilewith information

indicating

thephysicallayoutoftherouting. Thisincludes_specifyinglayers_used, locations

of vias andtheshape ofthe wires

implementing

thenetlist. With _this,thephysical design

processhascometoa point wherethesystemis abletobefabricatedinthesemiconductor

technology

which for itwas _originallytargeted. This is_rarely thefinal _stepas furtherver

ification and

testing

isperformedtoensurethe steps of_physically _creating thedesign has

notintroducederrorswith respectto thedevice'selectrical properties.

An

interesting

consequenceof

increasing

transistorspeedsisthatwire(signaltransmis

sion) delays have begunto become largerthan logic delays. This givesthat the insertion

ofbuffers may result in faster circuits

[7]

in certain situations. This is important to the

EDA/PDA community in that tools will have to take this into consideration

during

syn

thesis,placement, and routing. Ifa

long

interconnection _absolutelyhastobe _constructed,

its

delay

may possibly be mitigated through the insertion of abuffer. The design would

then haveto be analyzed with this buffer in placeto determine ifthere isa net benefitto

(40)

its presence. If

included,

this additional component wouldthen have to be inserted into

the

design,

_possibly afterthefirst placement attempt. Iffurther placement attemptselim

inate the

long interconnect,

this _{buffer may}thenpose a performance reduction _givingthat

it should be removed.

Furthermore,

placement ofthe buffer with respect to the location

ontheinterconnect determines itseffectiveness, a placementtoolwould thenhaveto take

this into consideration. As logic delays continue to become less than wire

delays,

the

EDA/PDA community willhave to integratethe management of performance

increasing

buffers accordingly.

3.4.3

Back Annotation

This step involves_extractingall electrical characteristics ofthe_{final translated,}placedand

routed design in order to allow for very precise simulation. Previous simulations could

only approximate the electrical properties ofthe entire design not

knowing

the physical

geometries ofthe device.

Now,

having

the device and it'sphysical

layout,

the simulator

can account for previously unknown factors which _may affect the performance or even

the correctness ofthe design inorderto ensurethat the final product when packaged and

inserted into a circuit will function as _originally envisioned

during

the first steps ofthe

IC design flow. _{If any} problems are found here the design _may be sent backto previous

steps, discussedabove, to enter modifications which will

hopefully

correcttheseproblems

without

introducing

others. Thefinal design istranslated intoa formatwhich afabricator

(41)

Chapter 4

Placement

Placementinvolves _arranging components inside ofthe floorplan ofthe die orFPGA tar

geted forthe constructionofthe integratedcircuit

implementing

theoriginal design. This

can bethoughtof as analogousto_arranging an office'sfloorplan (target

die)

withdifferent

size offices

(components)

in order to set _up the floor in the most efficient _way possible.

Efficiency

hasmultiple variables each ofwhich must beconsidered when

determining

an

acceptable arrangement.

Making

surethat the employees ofthe office are situated close

to others which whom

they

will have the most contact with is a definite goal,

however,

making sure the office holds as _many employees as possible should also be considered.

Allowing

room for walkways and other essential constructs is a restriction on both the

numberof workspaces which canbe includedandthe_way

they

canbeplaced onthefloor.

Thesecomparisons are

directly

applicable to_placing an integratedcircuit; boththe prox

imity

of

interacting

components andthe

density

oftheirplacement areofhighimportance.

Additionally,

allowingroomforstructures such as wires must alsobeconsidered while ar

rangingthe components,ifplacedtoo

tightly

thesewillhaveno room orthesemiconductor

mayoverheatthus_cripplingthe circuit muchinthesame_way an office with no walkways

orwhichhasemployees packed against each othercould notfunction.

Determining

aperfectplacement solutionis anNP-completeproblem

[30]

_givingthat

for_{any design}of considerable size

finding

acomplete solution requires_overwhelmingcom

putational effort. Foradesignwith n components, thereexists a search space ontheorder

(42)

thenumber of placement grid pointstakento thepower n[30]. Withtime to market, devel

opmenttooland other administrativeconcerns, simply allowinga machineto_exhaustively

operateontrialplacements in ordertofindan optimal solutionover a matter of months or

investing

in cutting edgehardwareto reducethe computetime to daysarenot acceptable

methods ofCADassistance. Moreelegant methods ofplacement_{providing very}goodbut

not perfect solutions allow fora reductionin computing effort; the cost of a non-optimal

solutionis deemedacceptablefor itspayoffinreducedplacementtime.

The act of_placingcomponents onto a die is an example of a combinatorial optimiza

tionproblem

[36];

thegeneralformulation is introduced here. Placement isa minimization

problem_meaningthatone placement whichhas lowercost thananotheris moredesirable

and aplacement whichhasthe leastcost isconsideredastheoptimum. Combinatorial op

timization_mayalso exist as maximizationproblemsinwhichthehighestcostisconsidered

as the optimum. A specific problem

instance,

or a unique component set and_netlist, can

be formalizedas a pair

(S,f)

where Srepresentsthefinite set of all possible _solutions, or

placements, and

/

givesthe cost function

by

which individual solutions in setS _maybe

comparedtoone another. Thecost function_mappingis defined as

f-.s^yt

(4.i)

whichisto_saythecostfunctionproduces a real valueforindividualsolutionsfroma given

set. As stated above, placement is a minimization _{problem; this} gives that a solution is

soughtthatsatisfies

f{iopt)<f{i),VieS

(4.2)

A maximization problem uses the formula given above with the

inequality

reversed and

both minimizationand maximizationproblems use the termoptimal to representthe best

solution in set

S,

iopt or set ofbest solutions Sopt. _{The globally} optimum solution

i^

is either aminimal or maximal solution

depending

on theproblem _type, minimization or

maximization,respectively [1].

(43)

than another

by

_evaluating saidparameter(s) in a_wayto_meaningfully representthe solu

tionwith respecttoothers. Cost functionsare_usuallyuniqueto theproblemsinwhich

they

are

implemented,

_{generally making}thecomparisonoftwo differentoptimization problems

usingthesame costfunctionmeaninglessifnotimpossible. Costs caninclude_anynumber

ofparameterswhich are pertinentto

finding

an optimalsolution. Asthisdeterminesabest

solution, implementation ofthecostfunction is avery importantportionofthe combina

torial optimization problem.

Along

with_actually

determining

theoptimal solution_usinga

cost

function,

finding

thefunction'srepresentationwhich willgive a specific solution from

thesearch space as an optimal cost isalso anNP-complete problem.

However,

analysis of

the problem athand willprovide _verygood guidelines as howtodefine the cost

function,

i.e.,

minimizingwirelength forplacement.

4.1

Exhaustive

Search

The mosttrivialmethodthatcan beusedtofind thebestplacement solution isan exhaus

tive search

involving

_computingthecost of each individualplacement inside ofthesearch

space and _selecting the optimal _solution(s), as stated _above, an NP-complete operation.

Using

the naiveapproach, the quantityof placement solutions isproportional to thenum

berofplacementpoints onthe targetdietaken to thepowerofthenumber of components.

Specifically,

thenumberof solutions isgiven

by

i

\\{Xpointsn

x

Ypointsn)

(4.3)

71=0

whereiis thenumberof components andXpointsand Ypoints are thenumber of pointsin

which component n _maybeplacedinthex-axis_andj^-axis,respectively. Asthismethodis

very inefficientforlarger

designs,

itisrarelyused. Fortrivialdesigns (tensof_components),

this method will provide the best output for computing effort

invested,

_guaranteeing the optimum solution. Ingeneral, thismethodisnot recommended.

(44)

4.2

Generalized

Hill Climbing/Local

Search

This method is a modification ofthe exhaustive search inwhich an intermediate solution

is kept_onlyif its respectivecost is more desirable than the one fromwhichitwas gener ated. Though thismethod is aderivativeof exhaustive _search, it has _striking resemblance toSimulated

Annealing

[1]. Theconceptof aneighborhoodstructureis introduced hereto facilitategeneration of new solutionstobecompared againsttheoriginal.

Aneighborhood isthe setofsolutions created

by

_{moving away}fromthe current solu tion

by

one "step". A step is definedinthesamemannerasthecost

function,

dependenton which parameters are pertinentto the problem and _usually independent from other prob

lems. Foreach solutioni Sa set

St

C Sofsolutionsthatare closetoi

by

one_step,

St

is

knownastheneighborhood ofi and_anysolution

j

G

S,

isknownas a_neighboringsolution

to i.

The algorithm_{usually begins}

by

_generatinga random_solution, _computingits cost and

generatinga neighborhoodfromtheinitial solution. Theneighborhoodisthensearchedfor abetter solution as compared to thatwhichthe search neighborhoodwas generatedfrom.

Ifabettersolution is

found,

anewneighborhood isgenerated fromthisandtheprocessis

repeated. Theprocesscontinuestoiterateuntil a neighborhoodisgeneratedwhichcontains no_neighboringsolutions with abettercostthan the prior solution.

Here,

the algorithm _only moves _along the hill ifthegiven path will take it to a more

desirable solution

-lowercost inthis case. Themajordownfallsofthismethod arethatit is

highly

dependentontheinitialplacementandthatitis

highly

susceptibleto

being

caught in localminima.

Being

_purely _{greedy, the} algorithm will consider a local minimumto be thebestsolution whenmore desirablesolutions_mayexist which are_onlyreachedthrough first

increasing

the solution's cost. Figure 4.1 displaysthis graphically. A local minimum

solutioni isdefinedas

(45)

Global Minimum

[image:45.542.63.382.58.251.2]

Progressionof_NeighboringSolutions

Figure 4.1: Seriesof

Neighboring

Solutions

Containing

aLocal Minimum

ori isa solution whichhasalowercostthan_anyother solutioninitsneighborhood.

It is _very possible that a poor intermediate solution will be selected such that better

solutions will neverbe encountered, this can be avoided

by

_starting the algorithm witha

largenumber ofinitial solutions. Asmore initial solutions areused, theprobability thata

global optimum willbe found_{asymptotically}approaches _unity [1], Giventhis, generation

and search ofinitial solutions is _easilyparallelizableifindependentneighborhoods canbe

guaranteed,

i.e.,

duplicate search efforts canbeeliminated.

Mixing

thismethod with other

methods, possibly as a final _greedy step, may yield betterresults more _efficiently than if

usedalone [36].

4.3

Min-Cut

The min-cut method is a recursive _partitioningmethod which uses the principle that ifa

floorplan iscut in

half,

the fewerwiresthat it _{cuts, the} more efficientthe placement. The

min-cut operation is performed on the sub-levels ofthefirst cut and so on until _only one

component is left atthe lowest level. There are several problems with this method in its

purestform

including

loss ofinformation fromone levelto the next;

however,

techniques

(46)

canbe applied _allowingthe algorithm's _efficiencyto be increased. With respecttoplace

mentthismethodis best usedto_quicklyconvergeon a solution used either as afinalresult

or as a_startingpointforanotheralgorithm

[29]

[36].

4.4

Genetic

Algorithms

Thesealgorithms take theirform fromnature and evolution, thatis to _saya population is

formed,

breeding

occurs and the members who are most fit fortheir purpose survive to

pass goodtraits ofthe species _along tothe future. In itsapplication to _computing