• No results found

To appear in the Conference on Computer-Aided Verification, June 21-23, Automatic Verication of. Pipelined Microprocessor Control

N/A
N/A
Protected

Academic year: 2021

Share "To appear in the Conference on Computer-Aided Verification, June 21-23, Automatic Verication of. Pipelined Microprocessor Control"

Copied!
17
0
0

Loading.... (view fulltext now)

Full text

(1)

(this version has an appendix not present in the conference version)

Automatic Verication of

Pipelined Microprocessor Control

JerryR.BurchandDavidL.Dill

ComputerScienceDepartment

StanfordUniversity

Abstract. We describ e a technique for verifying the control logic of

pip elinedmicropro cessors.Ithandlesmorecomplicateddesigns,and

re-quires less human intervention, than existingmetho ds. The technique

automaticallycompares apip elinedimplementationto anarchitectural

description.TheCPUtimeneededforvericationisindep endentofthe

data path width, the register le size, and the numb er of ALU op

er-ations. Debugging information is automaticallypro duced forincorrect

pro cessordesigns.Muchofthep owerofthemetho dresultsfroman

e-cientvaliditycheckerforalogicofuninterpretedfunctionswithequality.

Empiricalresultsincludethe vericationofapip elinedimplementation

ofasubsetoftheDLXarchitecture.

1 Introduction

Thedesignofhigh-p erformancepro cessors is averyexp ensiveand comp etitive

enterprise. Thesp eed with whicha designcan b e completed is acrucial factor

indeterminingits success inthemarketplace. Concernab outdesignerrorsis a

majorfactorindesigntime.Forexample,eachmonthofadditionaldesigntime

oftheMIPS4000pro cessor wasestimatedtocost$3-$8million,and27%ofthe

designtimewassp ent in\vericationandtest"[13].

Web elievethat formalvericationmetho ds could eventuallyhavea

signif-icant economic impacton micropro cessor designs by providingfaster metho ds

forcatchingdesignerrors,resultinginfewerdesigniterationsandreduced

simu-lationtime.Formaximumeconomicimpact,avericationmetho dologyshould:

{ b e abletohandle mo dernpro cessor designs,

{ b eapplicabletotheasp ectsofthedesignthataremostsusceptibletoerrors,

{ b e relativelyfastandrequire littlelab or,and

{ provideinformationtohelppinp ointdesignerrors.

Theb est-knownexamplesofformallyveriedpro cessorshaveb eenextremely

simplepro cessordesigns,whichweregenerallyunpip elined[7,8,15,16].The

ver-icationmetho dsusedrelyontheorem-proversthatrequire agreatdealofvery

skilledhumanguidance(thepracticalunitofformeasuringlab orinthesestudies

(2)

ascontrolcomplexity.Therearemorerecentvericationtechniques[1,17]that

are much more automatic,but theyhave notb een demonstrated on pip elined

pro cessors.

The verication of mo dern pro cessors p oses a sp ecial problem. The

natu-ralsp ecication ofa pro cessor istheprogrammer-levelfunctionalmo del,called

theinstructionsetarchitecture.Suchasp ecicationisessentiallyanop erational

description of a pro cessor that executes each instruction separately, one cycle

p erinstruction.Theimplementation,onetheotherhand,neednotexecute each

instruction separately; several instruction might b e executing simultaneously

b ecause ofpip elining,etc. Formalvericationrequires provingthat thesp

eci-cationandimplementationareina prop errelationship,butthat relationshipis

notnecess arilyeasytodene.

Recently, there have b een successful eorts to verify pip elined pro cessors

using human-guided theorem-provers[11, 19, 20, 22]. However,in all of these

cases,eitherthepro cessor wasextremelysimpleoralargeamountoflab orwas

required.

Althoughtheexampleswehaveattackedarestillmuchsimplerthancurrent

high-p erformancecommercial pro cessors, they are signicantlyb eyond the

ca-pabilitiesofautomaticvericationmetho dsrep orted previously.Themetho dis

targetedtowardserrorsinthemicropro cessorcontrol,which,accordingtomany

designers, is where most of the bugs usually exist (datapaths are usually not

considered dicult,except for oating p oint op erations). Lab or is minimized,

sincethepro cedure isautomaticexcept forthedevelopmentofthedescriptions

ofthesp ecication andimplementation.Whenthe implementationof the

pro-cessor isincorrect, themetho dcanpro duceasp ecic exampleshowinghowthe

sp ecicationisviolated.

Since wewishto fo cusonthe pro cessor control,we assumethatthe

combi-nationallogicinthedatapathiscorrect.Under thisassumption(whichcanb e

formallycheckedusing existing techniques), the dierences b etween the sp

eci-cation and implementationb ehaviors are entirelyin the timingof op erations

andthetransferofvalues.Forexample,whenthesp ecicationstoresthesumof

tworegisters ina destinationregister, theimplementationmayplacetheresult

ina pip eregister, andnotwritetheresulttoitsdestination untilafteranother

instructionhasb egun executing.

Thelogicwehavechosenisthequantier-freelogicofuninterpretedfunctions

andpredicateswithequalityandprop ositionalconnectives.Uninterpreted

func-tionsareused toreprese ntcombinationalALUs,forexample,withoutdetailing

theirfunctionality.Prop ositionalconnectivesandequalityareusedindescribing

controlinthesp ecication andtheimplementation,andincomparingthem.

The validityproblem forthis logicis decidable.Inpractice,the complexity

isdominatedbyhandlingBo olean connectives,justas withrepresentations for

prop ositionallogicsuchas BDDs[2].However,theadditionalexpressivenessof

ourlogicallowsvericationproblemstob edescrib edatahigherlevelof

(3)

Corellahasalsoobservedthat uninterpretedfunctionsandconstantscan b e

usedtoabstractawayfromthedetailsofdatapaths,inordertofo cusoncontrol

issues[9, 10]. Hehas a programfor analyzinglogicalexpressionswhich he has

used for verifying a non-pip elined pro cessor and a prefetch circuit. Although

the details are not presented, his analysis pro cedure app ears to b e much less

ecientthanours,andhedo esnotaddresstheproblemsofsp ecifyingpip elined

pro cessors.

Ourmetho dcanb esplitintotwophases.Therstphasecompilesop erational

descriptions of the sp ecication and implementation,then constructs a logical

formulathatisvalidifandonlyiftheimplementationiscorrectwithresp ec tto

thesp ecication.Thesecondphaseisadecisionpro cedure thatcheckswhether

theformulaisvalid.Thenexttwosectionsdescrib e these twophases. Wethen

giveexp erimentalresultsandconcludingremarks.

2 Correctness Criteria

Thevericationpro cess b egins withthe userprovidingb ehavioraldescriptions

ofan implementationanda sp ecication. For pro cessor verication,thesp

eci-cation describ es how the programmer-visibleparts of the pro cessor state are

up datedwhenone instruction isexecuted everycycle.Theimplementation

de-scriptionshouldb e atthehighestlevelofabstractionthatstillexp osesrelevant

designissues,suchaspip elining.

Eachdescriptionisautomaticallycompiledintoatransitionfunction ,which

takesa state as its rst argument, the current inputs as its second argument,

and returns the next state. The transition function is enco ded as a vector of

symb olicexpressions withone entryforeachstatevariable.AnyHDLcouldb e

usedforthedescriptions,givenanappropriatecompiler.Ourprototyp everier

used a simple HDL based on a small subset of Common LISP. The compiler

translates b ehavioral descriptions intotransition functions through a kind of

symb olicsimulation.

WewriteF

Sp ec andF

Impl

todenotethetransitionfunctionofthesp ecication

andthe implementation,resp ect ively.Werequire that theimplementationand

thesp ecicationhavecorresp ondinginputwires.Thepro cessorswehaveveried

have no explicit output wires since the memory was mo deled as part of the

pro cessor andwedidnotmo del I/O.

Almostallpro cessors haveaninputsettingthat causesinstructionsalready

inthepip elinetocontinueexecutionwhilenonewinstructionsareinitiated.This

istypicallyreferredtoas stallingthepro cessor.IfI

Stall

isaninputcombination

that causes the pro cessor to stall,then the functionF

Impl (1;I

Stall

) represents

theeect ofstallingforonecycle.All instructionscurrentlyin thepip elinecan

b ecompletedbystallingforasucentnumb erofcycles.Thisop erationiscalled

ushing thepip eline,anditis animp ortantpartofourvericationmetho d.

Intuitively,the verier should prove that if the implementationand sp

(4)

matchingtheimplementation and sp ecication is the presenc e ofpartially

ex-ecuted instructions in the pip eline. Various parts of the implementationstate

are up dated at dierent stages of the execution of an instruction, so it is not

necessarily p ossible to nd a p oint where the implementation state and the

sp ecication state can b e compared easily.The verier solves this problem by

simulatingtheeect ofcompletingeveryinstructioninthepip elineb eforedoing

thecomparison. The natural wayto complete every instruction isto ush the

pip eline.

All of this is made more precise in gure1.The implementationcan b e in

anarbitrarystateQ

Impl

(lab eled \OldImplState"in the gure).Tocomplete

the partiallyexecuted instructionsin Q

Impl

, the pip elineis ushed, pro ducing

\FlushedOldImplState".Then,allbuttheprogrammer-visiblepartsofthe

im-plementationstatearestripp edo(wedenethefunctionproj forthispurp ose)

topro duceQ

Sp ec

,the\OldSp ecState".BecauseofthewayQ

Sp ec isconstructed fromQ Impl ,wesaythatQ Sp ec matches Q Impl . OldImpl State NewImpl State ? FImpl(1;I) -F Impl (1;I Stall ) -F Impl (1;I Stall ) r r r r r r -F Impl (1;I Stall ) -F Impl (1;I Stall ) FlushedOld ImplState Flushed New ImplState -proj -proj OldSp ec State NewSp ec State ? FSp ec(1;I)

Fig.1.Commutativediagramforshowingourcorrectnesscriteria.

Let I b e an arbitrary input combination to the pip eline (recall that the

sp ecication andthe implementation arerequired tohavecorresp onding input

wires). Let Q 0 Impl = F Impl (Q Impl

;I), the \New Impl State", and let Q 0 Sp ec = F Sp ec (Q Sp ec

;I),the\NewSp ec State".Weconsider theimplementationto

sat-isfythesp ecicationifandonlyifQ 0

Sp ec

matchesQ 0

Impl

.Tocheckthis,ushand

project Q 0

Impl

, thensee ifthe result isequal to Q 0

Sp ec

, as shownatthe b ottom

ofgure1.

It isoften convenientto use a slightlydierent (but equivalent) statement

ofour correctness criteria.In gure1,there are twodierent paths from \Old

Impl State"to \NewSp ec State".The paththat involvesF

Impl

(1;I)is called

the implementation side of the diagram; the path that involvesF

Sp ec

(1;I) is

called the specication side. For each path, there is a corresp onding function

that is the comp ositionof the functions lab eling the arrows on the path. We

saythattheimplementationsatisesthesp ecicationifandonlyifthefunction

(5)

func-diagrammust commute.

The reader maynotice that gure1 has thesame form as commutative

di-agrams used with abstraction functions. In our case, the abstraction function

representstheeect ofushinganimplementationstateandthenapplyingthe

proj function.Typicalvericationmetho dsrequirethatthereexistsome

abstrac-tionfunctionthat makesthediagramcommute.Incontrast,werequirethatour

sp ecicabstractionfunctionmakesthediagramcommute.

In some cases, it may b e necessary during verication to restrict the set

of \Old Impl States" considered in gure1. In this case, an invariant could

b e provided(theinvariantmust b eclosed under the implementationtransition

function).Alloftheexamples inthispap erwereprovedcorrect withouthaving

touseaninvariant.

NoticethatthesameinputI isappliedtob oththeimplementationandthe

sp ecication in gure1.This is onlyappropriatein thesimple casewhere the

implementationrequires exactly one cycle p er instruction (once thepip eline is

lled).Ifmorethanonecycleissometimesrequired,then ontheextracyclesit

isnecessary to applyI

Stall

tothe inputsof thesp ecication rather thanI. An

exampleofthisisdiscussed insection4.2.

3 CheckingCorrectness

Asdescrib ed ab ove,toverifyapro cessor wemustcheckwhether thetwo

func-tionscorresp ondingto the twosides ofthediagram ingure1 areequal.Each

ofthetwofunctionscan b ereprese ntedbyavectorofsymb olicexpressions.The

vectors have one comp onent for each programmer-visible state variableof the

pro cessor.These expressions can b e computedecientlybysymb olically

simu-latingthe b ehavioraldescriptions of theimplementationand the sp ecication.

Theimplementationissymb olicallysimulatedseveraltimestomo del theeect

ofushingthepip eline.

Leths 1 ;...;s n iandht 1 ;...;t n

ib evectors ofexpressions. Toverifythat the

functionstheyrepresentareequal,wemustcheckwhether eachformulas

k =t

k

isvalid,for1 kn. Before describingouralgorithm forthis, we dene the

logicwe useto enco dethe formulas.

3.1 UninterpretedFunctionswith Equality

Many quantier-free logics that include uninterpreted functions and equality

have b een studied. Unlike most of those logics[18, 21], ours do es notinclude

additionoranyarithmeticalrelations.Forourapplicationofverifying

micropro-cessor control,there do es not app ear to b e anyneed to have arithmetic built

intothelogic(althoughtheabilitytodeclarecertainuninterpretedfunctions to

b easso ciativeand/orcommutativewouldb euseful).

We b egin by describing a subset of the logic we use. This subset has the

(6)

j(htermi=htermi)

jhpredicatesymboli(htermi;...;htermi)

jhpropositionalvariableijtruejfalse

htermi::=ite(hformulai;htermi;htermi)

jhfunction symboli(htermi;...;htermi)

jhtermvariablei:

Notice that the ite op erator can b e used to construct b oth formulas and

terms. We included ite as a primitive b ecause it simplies our case-splitting

heuristicsandb ecause itallowsforecientconstructionoftransitionfunctions

withoutintro ducingauxiliaryvariables.

Thereisnoexplicitquanticationinthelogic.Also,wedonotrequiresp ecic

interpretations forfunctionsymb ols and predicate symb ols.A formulais valid

ifandonlyifitistrueforallinterpretations ofvariables,functionsymb olsand

predicatesymb ols.

Although the ite op erator, together with the constants true and false, is

adequatefor constructingallBo olean op erations, we alsoinclude logical

nega-tionanddisjunctionasprimitivesin ourdecisionpro cedure.Thissimpliesthe

rewriterulesusedtoreduceourformulas,esp eciallyrulesinvolvingasso ciativity

andcommutativityofdisjunction.

Verifyinga pro cessor usuallyrequires reasoningab outstores suchas a

reg-ister le or main memory. We mo del stores as having an unb ounded address

space. If a pro cessor design satises our correctness criteria in this case, then

itis correct foranyniteregister leormemory.Ifcertainconventionsare

fol-lowed,theab ovelogicisadequateforreasoningab outstores.However,wefound

itmore ecienttoaddtwoprimitives,read andwrite,formanipulatingstores.

These primitivesareessentiallythesame astheselect andstoreop erators used

byNelsonandOpp en[18].Ifregleisavariablereprese ntingtheinitialstateof

aregister le,then

write(regle ; addr;data)

represents the storethat resultsfrom writingthe valuedata intoaddress addr

of regle.The valueat address addr in the originalstate of the register le is

denotedby

read(regle ; addr):

Anyexpression that denotesa store,whether itis constructed using variables,

(7)

Pseudo-co de for a simpliedversion ofour decisionpro cedure for checking

va-lidity,alongwitha descriptionof itsbasic op eration,is givenin gure2.This

pro cedureisstillpreliminaryandmayb eimprovedfurther,sowewilljustsketch

themainideasb ehind it.

Ourdecisionpro cedure diersinseveralresp ec ts from earlierwork[18,21].

Arithmeticisnotasourceofcomplexityforouralgorithm,sinceitisnotincluded

in our logic.Inour applications, the p otentiallycomplexBo olean structure of

theformulaswecheckistheprimaryb ottleneck.Thus,wehaveconcentratedon

handlingBo oleanstructure ecientlyin practice.

Another dierence is that we are careful to represent formulas as directed

acyclicgraphs(DAGs)withnodistinctisomorphicsubgraphs.Forthistoreduce

thetime complexityof thevaliditychecker,it is necessary to memoize (cache)

intermediateresults. As shownin gure2,the cachingschemeismore

compli-cated than in standard BDD algorithms[2] b ecause formulas must b e cached

relativetoa setofassumptions.

Thenalmajordierence b etweenouralgorithmandpreviousworkisthat

wedonotrequireformulastob erewrittenintoaBo oleancombinationofatomic

formulas. For example, formulas of the form e

1 = e 2 , where e 1 and e 2 may

containanarbitrarynumb erofite op erators, arecheckeddirectly withoutrst

b eingrewritten.

Detlefs and Nelson[12] have recently develop ed a new decision pro cedure

basedona conjunctivenormalformreprese ntationthat app ears tob e ecient

inpractice.Wehavenotyetb een abletodoathoroughcomparison,however.

Ascheckdo esrecursivecaseanalysisontheformulap,itaccumulatesasetof

assumptionsAthatisusedasanargumenttodeep errecursivecalls(seegure2).

Thisset ofassumptionsmust notb ecome inconsistent.Toavoid such

inconsis-tency, we require that if p

0

is the rst result of simplify(p;A

0

), then neither

choose splitting formula(p

0

)noritsnegationislogicallyimpliedbyA

0

.Wecall

thistheconsistencyrequirementonsimplifyandchoosesplittingformula.

Main-tainingtheconsistency requirementismadeeasierbyrestricting thepro cedure

choosesplittingformulatoreturnonlyatomicformulas(formulascontainingno

ite,or,not orwrite op erations).

Aswritteningure2,ouralgorithmisindistinguishablefromaprop ositional

tautologychecker.Dealingwithuninterpretedfunctionsandequalityisnotdone

inthetoplevelalgorithm.Instead,itisdoneinthesimplifyroutine.Forexample,

giventhe assumptionse

1 =e 2 and e 2 =e 3

, itmust b e p ossibleto simplifythe

formula e

1 6= e

3

to false; otherwise,the consistency requirement wouldnotb e

maintained.Asaresult,simplifyandasso ciatedroutinesrequirealargefraction

oftheco de in ourverier.

In spite ofthe consistency requirement,there is signicantlatitudein how

aggressively formulasare simplied. It mayseem b est to doas much

simpli-cationas p ossible,but our exp erimentsindicate otherwise.Wesee tworeasons

forthis.Ifsimplifydo estheminimalamountofsimplicationnecessary tomeet

(8)

aggres-var s,p 0 ,p 1 : formula; A 0 ,A 1 ,U 0 ,U 1 ,U:setofformula;

G: setofset offormula;

b egin

ifptrue thenreturn ;;

ifpfalse then

print\notvalid";

terminateunsucces sfully;

G:=cache(p);

if9U 2G suchthatU Athen returnU; /*cachehit*/

/*cachemiss */

s:=cho osesplittingformula(p); /*prepare tocasesplit*/

/*dos falsecase*/ A 0 :=A[f:sg; (p 0 ,U 0 ):=simplify(p,A 0 ); U 0 :=U 0 [ check( p 0 ,A 0

); /*assumptionsusedforsfalsecase*/

/*dos truecase*/ A 1 :=A[fsg; (p 1 ,U 1 ):=simplify(p,A 1 ); U 1 :=U 1 [check( p 1 ,A 1

); /*assumptionsusedforstruecase*/

U :=(U

0

0f:sg)[(U

1

0fsg); /*assumptionsused */

cache(p):=G[fUg; /*addcacheentry*/

returnU;

end;

Fig.2.Thepro cedurecheckterminatessuccessfullyiftheformulapislogicallyimplied

by the set of assumptions A; otherwise, it terminates unsuccessfully. Notall of the

assumptionsinAneedb erelevantinimplyingp;whencheckterminatessuccessfully

it returns those assumptions that were actually used. The set of assumptions used

need notb e one ofthe minimalsubsets of A that impliesp. Checkingwhether p is

valid is done by letting A b e the emptyset. Initially,the global lo okup table cache

returns the emptysetforevery formula p.Later, cache(p) returns the set containing

thoseassumption sets that haveb een sucientto imply pinprevious callsof check.

Thepro cedurechoosesplittingformula heuristicallycho osesaformula tob eusedfor

casesplitting.Thecallsimplify(p;A

0

) returns as itsrst resulta formulaformedby

simplifyingpundertheassumptionsA

0

.ThesecondresultisthesetofformulasinA

0

(9)

(resultinginmore callstosimplify),thetotalCPUtimeused mayb ereduced.

Thesecond reasonismoresubtle. Supp osewearecheckingthe validityofa

formula pthat has a lot of sharedstructure when represe nted as a DAG.Our

hop e is that by caching intermediate results, the CPU time typically needed

for validity checking grows with the size of the DAG of p, rather than with

the size of its tree represe ntation. This can b e imp ortant in practice; for the

DLX example (section4.2) it is not unusual for the tree represe ntation of a

formulatob etwoordersofmagnitudelargerthantheDAGrepresentation.The

more aggressive simplify is, the more the shared structure of p is lost during

recursivecaseanalysis,whichapp ears toresultin worsecachep erformance.We

arecontinuingtoexp eriment withdierentkindsof simplicationstrategiesin

ourprototyp eimplementation.

Unlike the algorithm in gure2, our validity checker pro duces debugging

information for invalid formulas.This consists of a satisable set of (p ossibly

negated)atomicformulasthatimpliesthenegationoftheoriginalformula.When

verifyingamicropro cessor, thedebugginginformationcan b eused toconstruct

asimulationvectorthat demonstratesthebug.

There is another imp ortantdierence b etween ourcurrent implementation

andthealgorithmingure2.Let(p

0 ;U

0

)b e theresultofsimplify(p;A

0 ).

Con-trary to the description in gure2, in our implementation U

0

is not required

to b e a subset of A

0

. All that is required is that allof theformulas in U

0 are

logicallyimpliedbyA

0

,andthattheequivalenceofpandp

0

islogicallyimplied

byU

0

.Asa result,somethingmoresophisticatedthan subtractingoutthesets

fsg and f:sg must b e done to compute a U that is weak enoughto b e

logi-callyimpliedbyA(see gure2). Asecond complicationisthatndinga cache

hitrequirescheckingsucientconditionsforlogicalimplicationb etweensets of

formulas,rather than just set containment.However,dealingwith these

com-plicationsseems to b e justied sincethe cachehit ratioisincreased by having

simplifyreturnaU

0

thatisweakerthanitcouldb eifithadtob easubsetofA

0 .

Wearestillexp erimentingwithideasforbalancingthese issuesmore eciently.

4 Experimental Results

Inthissection,wedescrib eempiricalresultsforapplyingourvericationmetho d

toapip elined ALU[5]anda subsetoftheDLXpro cessor[14].

4.1 Pip elinedALU

The3-stage pip elinedALUwe considered (gure3) hasb een used as a b

ench-markforBDD-basedvericationmetho ds[3,4,5,6].Anaturalwaytocompare

thep erformanceofthese metho ds is tosee howthe CPU timeneededfor

veri-cationgrowsasthepip elineis increasedinsizeby(forexample) increasingits

datapathwidth worits register lesize r . ForBurch,ClarkeandLong[4]the

(10)

eargrowthin b othwandr . Sublineargrowthinrandsub quadraticgrowthin

wwasachievedbyBryant,BeattyandSeger[3].

Control

Register file

ALU

Op2

Read ports

Write port

Op1

Instruction inputs

Fig.3.3-stagepip elinedALU.Ifthestallinputistrue,thennoinstructionisloaded.

Otherwise,thesrc1andsrc2inputsprovidetheaddressoftheargumentsintheregister

le,theopinputsp eciestheALUop erationtob ep erformed onthearguments,and

thedestinputsp eciesweretheresultistob ewritten.

In our vericationmetho d, the width of the data path and the numb er of

registersand ALUop erationscan b eabstracted away.Asa result,one

verica-tionruncancheckthecontrollogicofpip elineswithanycombinationofvalues

fortheseparameters.Atotalof370millisecondsofCPUtime(runningcompiled

LucidCommonLISP ona DECstation5000/240)isrequired todo acomplete

verication run, including loading and compiling b ehavioral descriptions,

au-tomaticallyconstructing the abstraction function and relatedexpressions, and

checking the validity of the appropriate formula. The validity checking itself,

theprimaryb ottleneckonlargerexamples,onlyrequired50millisecondsforthe

pip elinedALU.

4.2 DLXPro cessor

Hennessy andPatterson[14] designedthe DLXarchitecture to teachthe basic

(11)

load word, unconditional jump, conditional branch (branch when the source

register is equal tozero), 3-register ALUinstructions,and ALUimmediate

in-structions.Aswiththepip elinedALUdescrib edearlier,thesp ecicsoftheALU

op erations are abstracted awayin b oth the sp ecication and the

implementa-tion.Thus,ourvericationcoversanysetofALUop erations,assumingthatthe

combinationalALUin thepro cessor hasb eenseparately veried.

Our DLXimplementationhasa standard 5-stagepip eline.The DLX

archi-tecture hasnobranchdelay slot;ourimplementationuses the\assume branch

nottaken"strategy.Nopip eliningisexp osedintheDLXarchitecture orinour

sp ecicationofit.Thus,itistheresp onsibilityoftheimplementationtoprovide

forwardingofdataanda loadinterlo ck.

Theinterlo ckandthelackofa branchdelayslotmeanthat thepip eline

ex-ecutes slightlylessthanone instructionp er cycle,onaverage.Thiscomplicates

\synchronizing" the implementation and the sp ecication during verication,

since the sp ecication executes exactly one instruction p er cycle. Weaddress

theprobleminamannersimilartothatusedbySaxeetal.[20].Theusermust

provide a predicate on the implementation states that indicates whether the

instruction to b e loaded onthe current cycle will actuallyb e executed by the

pip eline. While this predicate can b e quite complicated, it is easy to express

inourcontext,using internalsignals generated bytheimplementation.In

par-ticular,ourpip eline will notexecute the current instruction ifand only ifone

ormore of thefollowingconditionsholds:the stallinputis asserted,the signal

indicatinga takenbranchis asserted,or thesignal indicatingthat the pip eline

hasb eenstalledbytheloadinterlo ckisasserted.

When internal signals are used in this way, it is p ossible for bugs in the

pip elinetoleadtoa falsep ositivevericationresult.Inparticular,thepip eline

mayapp earcorrectevenifitcangetintoastatewhereitrefusestoeverexecute

anotherinstruction(akindoflivelo ck).Toavoidthep ossibilityofafalsep ositive,

we automatically check a progress condition that insures that livelockcannot

o ccur.TheCPUtimeneeded forthischeckisincludedinthetotalgivenb elow.

Oursp ecication hasfourstate variables:theprogramcounter,theregister

le,thedatamemoryandtheinstructionmemory.Ifthedatamemoryandthe

instruction memory are combined into one store in the sp ecication and the

implementation,then theverier willdetect that the pip elinedo es not satisfy

thesp ecicationforcertaintyp esofself-mo difyingco de(thishasb eenconrmed

exp erimentally).Separatingthetwostoresisonewaytoavoidthisinappropriate

negativeresult.

Foreachstatevariableofthesp ecication,theverier constructsan

appro-priate formula and checks its validity. Since neither the sp ecication nor the

implementationwrite to the instruction memory, checking the validity of the

corresp ondingformulaistrivial.Checkingtheformulasfortheprogramcounter,

thedatamemoryandthe registerlerequires 15.5seconds,34seconds and9.5

seconds of CPU time, resp ectively. The total CPU time required for the full

(12)

Inanothertest, weintro ducedabugin theforwardinglogicofthepip eline.

Theverierrequiredab out8 secondsto generate3 counter-examples,one each

forthethreeformulasthathadtob echecked.These counter-examplesprovided

sucient conditionson a initialimplementation state where the eects of the

bugwouldb e apparent. Thisinformationcan b e analyzedby hand,or used to

constructa startstatefora simulatorrun thatwouldexp osethebug.

5 Concluding Remarks

The need for improved debugging to ols is now obvious to everyone involved

in pro ducing a new pro cessor implementation. It is equally obvious that the

problem is worsening rapidly: driven by changesin semiconductor technology,

architecturesaremovingsteadilyfromthesimpleRISCmachinesofthe1980s

to-wardsverycomplexmachineswhichaggressivelyexploitconcurrencyforgreater

p erformance.

Althoughwehavedemonstratedthatthetechniquespresentedherecanverify

more complex pro cessors with much less eort than previous work, examples

suchas ourDLXimplementationarestillnotnearly ascomplexascommercial

micropro cessor designs. Wehavealso notyet dealt with memorysystems and

interrupts,whicharerichsourceofbugs inpractice.

It will b e very challenging to increase the capacity of vericationto ols as

quicklyas designers are increasing the scaleof the problem. Clearly, the

com-putationaleciency oflogicaldecisionpro cedures(inpractice,notintheworst

case) will b e a major b ottleneck. If decision pro cedures cannot b e extended

rapidlyenough,itmaystillb e p ossible touse some ofthe sametechniques for

partialvericationorina mixedsimulation/vericationto ol.

References

1. D.L.Beatty. AMethodologyfor FormalHardwareVerication,withApplication

to Microprocessors. PhD thesis, Scho ol of Computer Science, Carnegie Mellon

University,Aug.1993.

2. K.S.Brace,R.L. Rudell,and R.E.Bryant. Ecientimplementation ofaBDD

package. In27thACM/IEEEDesignAutomationConference,1990.

3. R.E. Bryant,D.L.Beatty,and C.-J.H. Seger. Formalhardwarevericationby

symb olicternary trajectoryevaluation. In28thACM/IEEEDesignAutomation

Conference,1991.

4. J.R.Burch,E.M.Clarke,andD.E.Long. Representingcircuitsmoreeciently

insymb olicmo delchecking.In28thACM/IEEEDesignAutomationConference,

1991.

5. J.R.Burch,E.M.Clarke,K.L.McMillan,andD.L.Dill. Sequentialcircuit

ver-icationusingsymb olicmo del checking. In27thACM/IEEEDesignAutomation

(13)

NineteenthAnnualACM Symposium on Principleson ProgrammingLanguages,

1992.

7. A.J.Cohn. Apro ofofcorrectnessoftheVip ermicropro cessors:Therstlevel.In

G.Birtwistleand P.A.Subrahmanyam,editors, VLSISpecication,Verication

andSynthesis,pages27{72.Kluwer,1988.

8. A.J. Cohn. Correctness prop erties of the Vip erblo ck mo del: Thesecond level.

In G.Birtwistle,editor, Proceedingsof the1988 DesignVericationConference.

Springer-Verlag,1989. AlsopublishedasUniversityofCambridgeComputer

Lab-oratoryTechnicalRep ortNo.134.

9. F.Corella. Automatedhigh-levelverication againstclo ckedalgorithmic sp

eci-cations. TechnicalRep ortRC18506,IBMResearchDivision,Nov.1992.

10. F.Corella. Automatic high-levelverication against clo ckedalgorithmic sp

eci-cations. InProceedingsof theIFIP WG10.2Conference onComputer Hardware

DescriptionLanguagesandtheirApplications,Ottawa,Canada,Apr. 1993.

Else-vierSciencePublishersB.V.

11. D.Cyrluk. Micropro cessor vericationinPVS:Ametho dologyandsimple

exam-ple. Technical Rep ort SRI-CSL-93-12, SRI Computer Science Lab oratory, Dec.

1993.

12. D.DetlefsandG.Nelson. Personalcommunication,1994.

13. J.L.Hennessy. Designingacomputerasamicropro cessor:Exp erienceandlessons

fromtheMIPS4000. AlectureattheSymp osiumonIntegratedSystems,Seattle,

Washington,March14,1993.

14. J.L. HennessyandD.A.Patterson. Computer Architecture:A Quantitative

Ap-proach.MorganKaufmann, 1990.

15. W.A. Hunt,Jr. FM8501: Averied micropro cessor. Technical Rep ort47,

Uni-versityofTexasatAustin,InstituteforComputingScience,Dec.1985.

16. J.Joyce, G.Birtwistle, and M.Gordon. Proving a computer correct in higher

orderlogic.TechnicalRep ort100,ComputerLab.,UniversityofCambridge,1986.

17. M.Langevin and E.Cerny. Vericationofpro cessor-likecircuits. In P.Prinetto

andP.Camurati,editors,AdvancedResearchWorkshoponCorrectHardware

De-signMethodologies,June1991.

18. G.Nelson and D.C. Opp en. Simplication byco op erating decision pro cedures.

ACMTrans.Prog.Lang.Syst.,1(2):245{257,Oct.1979.

19. A.W. Rosco e. Occam in the sp ecication and verication of micropro cessors.

PhilosophicalTransactionsoftheRoyalSocietyofLondon,SeriesA:Physical

Sci-encesandEngineering,339(1652):137{151 ,Apr.15,1992.

20. J.B.Saxe,S.J.Garland,J.V.Guttag,andJ.J.Horning.Usingtransformations

and verication incircuit design. Technical Rep ort78, DEC Systems Research

Center,Sept.1991.

21. R.E.Shostak. Apracticaldecisionpro cedureforarithmeticwithfunctionsymb ols.

J.ACM,26(2):351{360,Apr.1979.

22. M.Srivas and M.Bickford. Formal verication of a pip elined micropro cessor.

IEEESoftware,7(5):52{64,Sept.1990.

A Illustrative Example

In this app endix, we give a more detailed description of howwe veried the

(14)

gures4and5isgeneratedbysimplesyntactictransformationsfromthe

LISP-likedescriptionlanguagethatisusedwithourprototyp everier.Noticethatthe

pip eliningoftheALUiscompletelyabstracted awayinthesp ecication,but is

quiteexplicitinthedescriptionoftheimplementation.

The twob ehavioraldescriptions are automaticallycompiled intotransition

functions(gures 6and7). Thetransitionfunctionsareenco dedbyasso ciating

eachstatevariablewithanexpressionthatrepresen tshowthatstatevariableis

up datedeachcycle. Theexpressions arein thelogicofuninterpreted functions

and equality describ ed earlier. The concrete syntax for expressions in gures

6through10 is that used in our prototyp e verier, which is implemented in

LISP.

The verier automatically pro duces an expression for the appropriate

ab-straction function(gure8). Theabstraction functionand thetransition

func-tionsareused toautomaticallypro duceexpressions (gures9and10)that

cor-resp ond to the two sides of the commutative diagram in gure1. The

imple-mentationiscorrectifandonlyiftheseexpressionsareequivalent,whichcanb e

automaticallyveriedusingthevaliditycheckerdescrib ed earlier.

if stall

then regfile' := regfile;

else

regfile' :=

write(regfile, dest,

alu(op, read(regfile, src1)

read(regfile, src2)));

Fig.4.Sp ecicationforthepip elinedALUingure3.Primedvariablesrefertonext

statevalues;unprimedvariablesrefertocurrentstatevalues.Noticethatpip eliningis

(15)

then regfile' := regfile;

else regfile' := write(regfile, dest-wb, result);

bubble-wb' := bubble-ex;

dest-wb' := dest-ex;

result' := alu(op-ex, arg1, arg2);

bubble-ex' := stall;

dest-ex' := dest;

op-ex' := op;

if (not bubble-ex) and (dest-ex = src1)

then arg1' := result';

else arg1' := read(regfile', src1);

if (not bubble-ex) and (dest-ex = src2)

then arg2' := result';

else arg2' := read(regfile', src2);

Fig.5.Implementationofthepip elinedALU.Therstthreelinesofco desp ecifyhow

theregisterleisup datedbythenal(\writeback"or\wb")stageofthepip eline.The

nextthreelinessp ecifyhowthreepip eregistersareup dated bythesecond(\execute"

or \ex") stage. The remaining co de sp ecies how ve additional pip e registers are

up dated bytherststageof thepip eline.Noticethat arg1 isup dated withthenext

state(rather thanthecurrentstate)valueofeitherresultor regle;this isnecessary

fordataforwardingtoworkprop erlyinthe pip eline.

regfile: (ite stall

regfile (write regfile dest (alu op (read regfile src1) (read regfile src2))))

Fig.6.Sincetheregisterleistheonlystatevariableinthesp ecication,the

transi-tionfunctionhasonlyonecomp onent, whichisrepresented bytheab oveexpression.

ThisexpressionisautomaticallycompiledfromaLISP-likeversionofthepseudo-co de

(16)

regfile

(write regfile dest-wb result))

arg1: (ite (or bubble-ex

(not (= src1 dest-ex)))

(read

(ite bubble-wb

regfile

(write regfile dest-wb result))

src1)

(alu op-ex arg1 arg2))

Fig.7.Thetransitionfunctionforthepip elinehasninecomp onents,twoofwhichare

represented by the ab ove expressions. These expressions are automaticallycompiled

fromaLISP-likeversionofthepseudo-co de ingure5.

(ite bubble-ex

#1=(ite bubble-wb

regfile

(write regfile dest-wb result))

(write #1#

dest-ex

(alu op-ex arg1 arg2)))

Fig.8.Thisexpressionrepresentstheabstractionfunctionfromapip elinestatetoa

sp ecication state(only oneexpression isnecessary sincethe registerleis the only

comp onentofasp ecicationstate).The\#1#"notationdenotestheuseofasubterm

that wasdenedearlierinthe expressionbythe \#1="notation.This expressionis

computed automatically from the transition function of the pip eline; the user need

onlysp ecify howmany times to iterate the transition function to ush the pip eline

(17)

#1=(ite bubble-ex

#2=(ite bubble-wb

regfile

(write regfile dest-wb result))

(write #2#

dest-ex

(alu op-ex arg1 arg2)))

(write #1#

dest

(alu op

(read #1# src1)

(read #1# src2))))

Fig.9.Thisexpressionrepresentsthefunctionalcomp ositionoftheabstraction

func-tion and the transition function of the sp ecication. The verier can compute this

comp ositionautomatically.Theresultingfunctioncorresp ondstothesp ecicationside

ofthe commutativediagramingure1.

(ite stall

#1=(ite bubble-ex

#2=(ite bubble-wb

regfile

(write regfile dest-wb result))

(write #2#

dest-ex

#3=(alu op-ex arg1 arg2)))

(write #1#

dest

(alu op

(ite (or bubble-ex

(not (= src1 dest-ex)))

(read #2# src1)

#3#)

(ite (or bubble-ex

(not (= dest-ex src2)))

(read #2# src2)

#3#))))

Fig.10.Thisexpressionrepresentsthefunctionalcomp ositionofthetransition

func-tionoftheimplementationandtheabstractionfunction.Theveriercancomputethis

comp ositionautomatically.Theresultingfunctioncorresp ondsto theimplementation

References

Related documents

The PROMs questionnaire used in the national programme, contains several elements; the EQ-5D measure, which forms the basis for all individual procedure

• Speed of weaning: induction requires care, but is relatively quick; subsequent taper is slow • Monitoring: Urinary drug screen, pain behaviors, drug use and seeking,

○ If BP elevated, think primary aldosteronism, Cushing’s, renal artery stenosis, ○ If BP normal, think hypomagnesemia, severe hypoK, Bartter’s, NaHCO3,

The key segments in the mattress industry in India are; Natural latex foam, Memory foam, PU foam, Inner spring and Rubberized coir.. Natural Latex mattresses are

After successfully supporting the development of the wind power technology, an approach is needed to include the owners of wind turbines in the task of realizing other ways, other

Field experiments were conducted at Ebonyi State University Research Farm during 2009 and 2010 farming seasons to evaluate the effect of intercropping maize with

effect of government spending on infrastructure, human resources, and routine expenditures and trade openness on economic growth where the types of government spending are

Economic development in Africa has had a cyclical path with three distinct phases: P1 from 1950 (where data start) till 1972 was a period of satisfactory growth; P2 from 1973 to