(this version has an appendix not present in the conference version)
Automatic Verication of
Pipelined Microprocessor Control
JerryR.BurchandDavidL.Dill
ComputerScienceDepartment
StanfordUniversity
Abstract. We describ e a technique for verifying the control logic of
pip elinedmicropro cessors.Ithandlesmorecomplicateddesigns,and
re-quires less human intervention, than existingmetho ds. The technique
automaticallycompares apip elinedimplementationto anarchitectural
description.TheCPUtimeneededforvericationisindep endentofthe
data path width, the register le size, and the numb er of ALU op
er-ations. Debugging information is automaticallypro duced forincorrect
pro cessordesigns.Muchofthep owerofthemetho dresultsfroman
e-cientvaliditycheckerforalogicofuninterpretedfunctionswithequality.
Empiricalresultsincludethe vericationofapip elinedimplementation
ofasubsetoftheDLXarchitecture.
1 Introduction
Thedesignofhigh-p erformancepro cessors is averyexp ensiveand comp etitive
enterprise. Thesp eed with whicha designcan b e completed is acrucial factor
indeterminingits success inthemarketplace. Concernab outdesignerrorsis a
majorfactorindesigntime.Forexample,eachmonthofadditionaldesigntime
oftheMIPS4000pro cessor wasestimatedtocost$3-$8million,and27%ofthe
designtimewassp ent in\vericationandtest"[13].
Web elievethat formalvericationmetho ds could eventuallyhavea
signif-icant economic impacton micropro cessor designs by providingfaster metho ds
forcatchingdesignerrors,resultinginfewerdesigniterationsandreduced
simu-lationtime.Formaximumeconomicimpact,avericationmetho dologyshould:
{ b e abletohandle mo dernpro cessor designs,
{ b eapplicabletotheasp ectsofthedesignthataremostsusceptibletoerrors,
{ b e relativelyfastandrequire littlelab or,and
{ provideinformationtohelppinp ointdesignerrors.
Theb est-knownexamplesofformallyveriedpro cessorshaveb eenextremely
simplepro cessordesigns,whichweregenerallyunpip elined[7,8,15,16].The
ver-icationmetho dsusedrelyontheorem-proversthatrequire agreatdealofvery
skilledhumanguidance(thepracticalunitofformeasuringlab orinthesestudies
ascontrolcomplexity.Therearemorerecentvericationtechniques[1,17]that
are much more automatic,but theyhave notb een demonstrated on pip elined
pro cessors.
The verication of mo dern pro cessors p oses a sp ecial problem. The
natu-ralsp ecication ofa pro cessor istheprogrammer-levelfunctionalmo del,called
theinstructionsetarchitecture.Suchasp ecicationisessentiallyanop erational
description of a pro cessor that executes each instruction separately, one cycle
p erinstruction.Theimplementation,onetheotherhand,neednotexecute each
instruction separately; several instruction might b e executing simultaneously
b ecause ofpip elining,etc. Formalvericationrequires provingthat thesp
eci-cationandimplementationareina prop errelationship,butthat relationshipis
notnecess arilyeasytodene.
Recently, there have b een successful eorts to verify pip elined pro cessors
using human-guided theorem-provers[11, 19, 20, 22]. However,in all of these
cases,eitherthepro cessor wasextremelysimpleoralargeamountoflab orwas
required.
Althoughtheexampleswehaveattackedarestillmuchsimplerthancurrent
high-p erformancecommercial pro cessors, they are signicantlyb eyond the
ca-pabilitiesofautomaticvericationmetho dsrep orted previously.Themetho dis
targetedtowardserrorsinthemicropro cessorcontrol,which,accordingtomany
designers, is where most of the bugs usually exist (datapaths are usually not
considered dicult,except for oating p oint op erations). Lab or is minimized,
sincethepro cedure isautomaticexcept forthedevelopmentofthedescriptions
ofthesp ecication andimplementation.Whenthe implementationof the
pro-cessor isincorrect, themetho dcanpro duceasp ecic exampleshowinghowthe
sp ecicationisviolated.
Since wewishto fo cusonthe pro cessor control,we assumethatthe
combi-nationallogicinthedatapathiscorrect.Under thisassumption(whichcanb e
formallycheckedusing existing techniques), the dierences b etween the sp
eci-cation and implementationb ehaviors are entirelyin the timingof op erations
andthetransferofvalues.Forexample,whenthesp ecicationstoresthesumof
tworegisters ina destinationregister, theimplementationmayplacetheresult
ina pip eregister, andnotwritetheresulttoitsdestination untilafteranother
instructionhasb egun executing.
Thelogicwehavechosenisthequantier-freelogicofuninterpretedfunctions
andpredicateswithequalityandprop ositionalconnectives.Uninterpreted
func-tionsareused toreprese ntcombinationalALUs,forexample,withoutdetailing
theirfunctionality.Prop ositionalconnectivesandequalityareusedindescribing
controlinthesp ecication andtheimplementation,andincomparingthem.
The validityproblem forthis logicis decidable.Inpractice,the complexity
isdominatedbyhandlingBo olean connectives,justas withrepresentations for
prop ositionallogicsuchas BDDs[2].However,theadditionalexpressivenessof
ourlogicallowsvericationproblemstob edescrib edatahigherlevelof
Corellahasalsoobservedthat uninterpretedfunctionsandconstantscan b e
usedtoabstractawayfromthedetailsofdatapaths,inordertofo cusoncontrol
issues[9, 10]. Hehas a programfor analyzinglogicalexpressionswhich he has
used for verifying a non-pip elined pro cessor and a prefetch circuit. Although
the details are not presented, his analysis pro cedure app ears to b e much less
ecientthanours,andhedo esnotaddresstheproblemsofsp ecifyingpip elined
pro cessors.
Ourmetho dcanb esplitintotwophases.Therstphasecompilesop erational
descriptions of the sp ecication and implementation,then constructs a logical
formulathatisvalidifandonlyiftheimplementationiscorrectwithresp ec tto
thesp ecication.Thesecondphaseisadecisionpro cedure thatcheckswhether
theformulaisvalid.Thenexttwosectionsdescrib e these twophases. Wethen
giveexp erimentalresultsandconcludingremarks.
2 Correctness Criteria
Thevericationpro cess b egins withthe userprovidingb ehavioraldescriptions
ofan implementationanda sp ecication. For pro cessor verication,thesp
eci-cation describ es how the programmer-visibleparts of the pro cessor state are
up datedwhenone instruction isexecuted everycycle.Theimplementation
de-scriptionshouldb e atthehighestlevelofabstractionthatstillexp osesrelevant
designissues,suchaspip elining.
Eachdescriptionisautomaticallycompiledintoatransitionfunction ,which
takesa state as its rst argument, the current inputs as its second argument,
and returns the next state. The transition function is enco ded as a vector of
symb olicexpressions withone entryforeachstatevariable.AnyHDLcouldb e
usedforthedescriptions,givenanappropriatecompiler.Ourprototyp everier
used a simple HDL based on a small subset of Common LISP. The compiler
translates b ehavioral descriptions intotransition functions through a kind of
symb olicsimulation.
WewriteF
Sp ec andF
Impl
todenotethetransitionfunctionofthesp ecication
andthe implementation,resp ect ively.Werequire that theimplementationand
thesp ecicationhavecorresp ondinginputwires.Thepro cessorswehaveveried
have no explicit output wires since the memory was mo deled as part of the
pro cessor andwedidnotmo del I/O.
Almostallpro cessors haveaninputsettingthat causesinstructionsalready
inthepip elinetocontinueexecutionwhilenonewinstructionsareinitiated.This
istypicallyreferredtoas stallingthepro cessor.IfI
Stall
isaninputcombination
that causes the pro cessor to stall,then the functionF
Impl (1;I
Stall
) represents
theeect ofstallingforonecycle.All instructionscurrentlyin thepip elinecan
b ecompletedbystallingforasucentnumb erofcycles.Thisop erationiscalled
ushing thepip eline,anditis animp ortantpartofourvericationmetho d.
Intuitively,the verier should prove that if the implementationand sp
matchingtheimplementation and sp ecication is the presenc e ofpartially
ex-ecuted instructions in the pip eline. Various parts of the implementationstate
are up dated at dierent stages of the execution of an instruction, so it is not
necessarily p ossible to nd a p oint where the implementation state and the
sp ecication state can b e compared easily.The verier solves this problem by
simulatingtheeect ofcompletingeveryinstructioninthepip elineb eforedoing
thecomparison. The natural wayto complete every instruction isto ush the
pip eline.
All of this is made more precise in gure1.The implementationcan b e in
anarbitrarystateQ
Impl
(lab eled \OldImplState"in the gure).Tocomplete
the partiallyexecuted instructionsin Q
Impl
, the pip elineis ushed, pro ducing
\FlushedOldImplState".Then,allbuttheprogrammer-visiblepartsofthe
im-plementationstatearestripp edo(wedenethefunctionproj forthispurp ose)
topro duceQ
Sp ec
,the\OldSp ecState".BecauseofthewayQ
Sp ec isconstructed fromQ Impl ,wesaythatQ Sp ec matches Q Impl . OldImpl State NewImpl State ? FImpl(1;I) -F Impl (1;I Stall ) -F Impl (1;I Stall ) r r r r r r -F Impl (1;I Stall ) -F Impl (1;I Stall ) FlushedOld ImplState Flushed New ImplState -proj -proj OldSp ec State NewSp ec State ? FSp ec(1;I)
Fig.1.Commutativediagramforshowingourcorrectnesscriteria.
Let I b e an arbitrary input combination to the pip eline (recall that the
sp ecication andthe implementation arerequired tohavecorresp onding input
wires). Let Q 0 Impl = F Impl (Q Impl
;I), the \New Impl State", and let Q 0 Sp ec = F Sp ec (Q Sp ec
;I),the\NewSp ec State".Weconsider theimplementationto
sat-isfythesp ecicationifandonlyifQ 0
Sp ec
matchesQ 0
Impl
.Tocheckthis,ushand
project Q 0
Impl
, thensee ifthe result isequal to Q 0
Sp ec
, as shownatthe b ottom
ofgure1.
It isoften convenientto use a slightlydierent (but equivalent) statement
ofour correctness criteria.In gure1,there are twodierent paths from \Old
Impl State"to \NewSp ec State".The paththat involvesF
Impl
(1;I)is called
the implementation side of the diagram; the path that involvesF
Sp ec
(1;I) is
called the specication side. For each path, there is a corresp onding function
that is the comp ositionof the functions lab eling the arrows on the path. We
saythattheimplementationsatisesthesp ecicationifandonlyifthefunction
func-diagrammust commute.
The reader maynotice that gure1 has thesame form as commutative
di-agrams used with abstraction functions. In our case, the abstraction function
representstheeect ofushinganimplementationstateandthenapplyingthe
proj function.Typicalvericationmetho dsrequirethatthereexistsome
abstrac-tionfunctionthat makesthediagramcommute.Incontrast,werequirethatour
sp ecicabstractionfunctionmakesthediagramcommute.
In some cases, it may b e necessary during verication to restrict the set
of \Old Impl States" considered in gure1. In this case, an invariant could
b e provided(theinvariantmust b eclosed under the implementationtransition
function).Alloftheexamples inthispap erwereprovedcorrect withouthaving
touseaninvariant.
NoticethatthesameinputI isappliedtob oththeimplementationandthe
sp ecication in gure1.This is onlyappropriatein thesimple casewhere the
implementationrequires exactly one cycle p er instruction (once thepip eline is
lled).Ifmorethanonecycleissometimesrequired,then ontheextracyclesit
isnecessary to applyI
Stall
tothe inputsof thesp ecication rather thanI. An
exampleofthisisdiscussed insection4.2.
3 CheckingCorrectness
Asdescrib ed ab ove,toverifyapro cessor wemustcheckwhether thetwo
func-tionscorresp ondingto the twosides ofthediagram ingure1 areequal.Each
ofthetwofunctionscan b ereprese ntedbyavectorofsymb olicexpressions.The
vectors have one comp onent for each programmer-visible state variableof the
pro cessor.These expressions can b e computedecientlybysymb olically
simu-latingthe b ehavioraldescriptions of theimplementationand the sp ecication.
Theimplementationissymb olicallysimulatedseveraltimestomo del theeect
ofushingthepip eline.
Leths 1 ;...;s n iandht 1 ;...;t n
ib evectors ofexpressions. Toverifythat the
functionstheyrepresentareequal,wemustcheckwhether eachformulas
k =t
k
isvalid,for1 kn. Before describingouralgorithm forthis, we dene the
logicwe useto enco dethe formulas.
3.1 UninterpretedFunctionswith Equality
Many quantier-free logics that include uninterpreted functions and equality
have b een studied. Unlike most of those logics[18, 21], ours do es notinclude
additionoranyarithmeticalrelations.Forourapplicationofverifying
micropro-cessor control,there do es not app ear to b e anyneed to have arithmetic built
intothelogic(althoughtheabilitytodeclarecertainuninterpretedfunctions to
b easso ciativeand/orcommutativewouldb euseful).
We b egin by describing a subset of the logic we use. This subset has the
j(htermi=htermi)
jhpredicatesymboli(htermi;...;htermi)
jhpropositionalvariableijtruejfalse
htermi::=ite(hformulai;htermi;htermi)
jhfunction symboli(htermi;...;htermi)
jhtermvariablei:
Notice that the ite op erator can b e used to construct b oth formulas and
terms. We included ite as a primitive b ecause it simplies our case-splitting
heuristicsandb ecause itallowsforecientconstructionoftransitionfunctions
withoutintro ducingauxiliaryvariables.
Thereisnoexplicitquanticationinthelogic.Also,wedonotrequiresp ecic
interpretations forfunctionsymb ols and predicate symb ols.A formulais valid
ifandonlyifitistrueforallinterpretations ofvariables,functionsymb olsand
predicatesymb ols.
Although the ite op erator, together with the constants true and false, is
adequatefor constructingallBo olean op erations, we alsoinclude logical
nega-tionanddisjunctionasprimitivesin ourdecisionpro cedure.Thissimpliesthe
rewriterulesusedtoreduceourformulas,esp eciallyrulesinvolvingasso ciativity
andcommutativityofdisjunction.
Verifyinga pro cessor usuallyrequires reasoningab outstores suchas a
reg-ister le or main memory. We mo del stores as having an unb ounded address
space. If a pro cessor design satises our correctness criteria in this case, then
itis correct foranyniteregister leormemory.Ifcertainconventionsare
fol-lowed,theab ovelogicisadequateforreasoningab outstores.However,wefound
itmore ecienttoaddtwoprimitives,read andwrite,formanipulatingstores.
These primitivesareessentiallythesame astheselect andstoreop erators used
byNelsonandOpp en[18].Ifregleisavariablereprese ntingtheinitialstateof
aregister le,then
write(regle ; addr;data)
represents the storethat resultsfrom writingthe valuedata intoaddress addr
of regle.The valueat address addr in the originalstate of the register le is
denotedby
read(regle ; addr):
Anyexpression that denotesa store,whether itis constructed using variables,
Pseudo-co de for a simpliedversion ofour decisionpro cedure for checking
va-lidity,alongwitha descriptionof itsbasic op eration,is givenin gure2.This
pro cedureisstillpreliminaryandmayb eimprovedfurther,sowewilljustsketch
themainideasb ehind it.
Ourdecisionpro cedure diersinseveralresp ec ts from earlierwork[18,21].
Arithmeticisnotasourceofcomplexityforouralgorithm,sinceitisnotincluded
in our logic.Inour applications, the p otentiallycomplexBo olean structure of
theformulaswecheckistheprimaryb ottleneck.Thus,wehaveconcentratedon
handlingBo oleanstructure ecientlyin practice.
Another dierence is that we are careful to represent formulas as directed
acyclicgraphs(DAGs)withnodistinctisomorphicsubgraphs.Forthistoreduce
thetime complexityof thevaliditychecker,it is necessary to memoize (cache)
intermediateresults. As shownin gure2,the cachingschemeismore
compli-cated than in standard BDD algorithms[2] b ecause formulas must b e cached
relativetoa setofassumptions.
Thenalmajordierence b etweenouralgorithmandpreviousworkisthat
wedonotrequireformulastob erewrittenintoaBo oleancombinationofatomic
formulas. For example, formulas of the form e
1 = e 2 , where e 1 and e 2 may
containanarbitrarynumb erofite op erators, arecheckeddirectly withoutrst
b eingrewritten.
Detlefs and Nelson[12] have recently develop ed a new decision pro cedure
basedona conjunctivenormalformreprese ntationthat app ears tob e ecient
inpractice.Wehavenotyetb een abletodoathoroughcomparison,however.
Ascheckdo esrecursivecaseanalysisontheformulap,itaccumulatesasetof
assumptionsAthatisusedasanargumenttodeep errecursivecalls(seegure2).
Thisset ofassumptionsmust notb ecome inconsistent.Toavoid such
inconsis-tency, we require that if p
0
is the rst result of simplify(p;A
0
), then neither
choose splitting formula(p
0
)noritsnegationislogicallyimpliedbyA
0
.Wecall
thistheconsistencyrequirementonsimplifyandchoosesplittingformula.
Main-tainingtheconsistency requirementismadeeasierbyrestricting thepro cedure
choosesplittingformulatoreturnonlyatomicformulas(formulascontainingno
ite,or,not orwrite op erations).
Aswritteningure2,ouralgorithmisindistinguishablefromaprop ositional
tautologychecker.Dealingwithuninterpretedfunctionsandequalityisnotdone
inthetoplevelalgorithm.Instead,itisdoneinthesimplifyroutine.Forexample,
giventhe assumptionse
1 =e 2 and e 2 =e 3
, itmust b e p ossibleto simplifythe
formula e
1 6= e
3
to false; otherwise,the consistency requirement wouldnotb e
maintained.Asaresult,simplifyandasso ciatedroutinesrequirealargefraction
oftheco de in ourverier.
In spite ofthe consistency requirement,there is signicantlatitudein how
aggressively formulasare simplied. It mayseem b est to doas much
simpli-cationas p ossible,but our exp erimentsindicate otherwise.Wesee tworeasons
forthis.Ifsimplifydo estheminimalamountofsimplicationnecessary tomeet
aggres-var s,p 0 ,p 1 : formula; A 0 ,A 1 ,U 0 ,U 1 ,U:setofformula;
G: setofset offormula;
b egin
ifptrue thenreturn ;;
ifpfalse then
print\notvalid";
terminateunsucces sfully;
G:=cache(p);
if9U 2G suchthatU Athen returnU; /*cachehit*/
/*cachemiss */
s:=cho osesplittingformula(p); /*prepare tocasesplit*/
/*dos falsecase*/ A 0 :=A[f:sg; (p 0 ,U 0 ):=simplify(p,A 0 ); U 0 :=U 0 [ check( p 0 ,A 0
); /*assumptionsusedforsfalsecase*/
/*dos truecase*/ A 1 :=A[fsg; (p 1 ,U 1 ):=simplify(p,A 1 ); U 1 :=U 1 [check( p 1 ,A 1
); /*assumptionsusedforstruecase*/
U :=(U
0
0f:sg)[(U
1
0fsg); /*assumptionsused */
cache(p):=G[fUg; /*addcacheentry*/
returnU;
end;
Fig.2.Thepro cedurecheckterminatessuccessfullyiftheformulapislogicallyimplied
by the set of assumptions A; otherwise, it terminates unsuccessfully. Notall of the
assumptionsinAneedb erelevantinimplyingp;whencheckterminatessuccessfully
it returns those assumptions that were actually used. The set of assumptions used
need notb e one ofthe minimalsubsets of A that impliesp. Checkingwhether p is
valid is done by letting A b e the emptyset. Initially,the global lo okup table cache
returns the emptysetforevery formula p.Later, cache(p) returns the set containing
thoseassumption sets that haveb een sucientto imply pinprevious callsof check.
Thepro cedurechoosesplittingformula heuristicallycho osesaformula tob eusedfor
casesplitting.Thecallsimplify(p;A
0
) returns as itsrst resulta formulaformedby
simplifyingpundertheassumptionsA
0
.ThesecondresultisthesetofformulasinA
0
(resultinginmore callstosimplify),thetotalCPUtimeused mayb ereduced.
Thesecond reasonismoresubtle. Supp osewearecheckingthe validityofa
formula pthat has a lot of sharedstructure when represe nted as a DAG.Our
hop e is that by caching intermediate results, the CPU time typically needed
for validity checking grows with the size of the DAG of p, rather than with
the size of its tree represe ntation. This can b e imp ortant in practice; for the
DLX example (section4.2) it is not unusual for the tree represe ntation of a
formulatob etwoordersofmagnitudelargerthantheDAGrepresentation.The
more aggressive simplify is, the more the shared structure of p is lost during
recursivecaseanalysis,whichapp ears toresultin worsecachep erformance.We
arecontinuingtoexp eriment withdierentkindsof simplicationstrategiesin
ourprototyp eimplementation.
Unlike the algorithm in gure2, our validity checker pro duces debugging
information for invalid formulas.This consists of a satisable set of (p ossibly
negated)atomicformulasthatimpliesthenegationoftheoriginalformula.When
verifyingamicropro cessor, thedebugginginformationcan b eused toconstruct
asimulationvectorthat demonstratesthebug.
There is another imp ortantdierence b etween ourcurrent implementation
andthealgorithmingure2.Let(p
0 ;U
0
)b e theresultofsimplify(p;A
0 ).
Con-trary to the description in gure2, in our implementation U
0
is not required
to b e a subset of A
0
. All that is required is that allof theformulas in U
0 are
logicallyimpliedbyA
0
,andthattheequivalenceofpandp
0
islogicallyimplied
byU
0
.Asa result,somethingmoresophisticatedthan subtractingoutthesets
fsg and f:sg must b e done to compute a U that is weak enoughto b e
logi-callyimpliedbyA(see gure2). Asecond complicationisthatndinga cache
hitrequirescheckingsucientconditionsforlogicalimplicationb etweensets of
formulas,rather than just set containment.However,dealingwith these
com-plicationsseems to b e justied sincethe cachehit ratioisincreased by having
simplifyreturnaU
0
thatisweakerthanitcouldb eifithadtob easubsetofA
0 .
Wearestillexp erimentingwithideasforbalancingthese issuesmore eciently.
4 Experimental Results
Inthissection,wedescrib eempiricalresultsforapplyingourvericationmetho d
toapip elined ALU[5]anda subsetoftheDLXpro cessor[14].
4.1 Pip elinedALU
The3-stage pip elinedALUwe considered (gure3) hasb een used as a b
ench-markforBDD-basedvericationmetho ds[3,4,5,6].Anaturalwaytocompare
thep erformanceofthese metho ds is tosee howthe CPU timeneededfor
veri-cationgrowsasthepip elineis increasedinsizeby(forexample) increasingits
datapathwidth worits register lesize r . ForBurch,ClarkeandLong[4]the
eargrowthin b othwandr . Sublineargrowthinrandsub quadraticgrowthin
wwasachievedbyBryant,BeattyandSeger[3].
Control
Register file
ALU
Op2
Read ports
Write port
Op1
Instruction inputs
Fig.3.3-stagepip elinedALU.Ifthestallinputistrue,thennoinstructionisloaded.
Otherwise,thesrc1andsrc2inputsprovidetheaddressoftheargumentsintheregister
le,theopinputsp eciestheALUop erationtob ep erformed onthearguments,and
thedestinputsp eciesweretheresultistob ewritten.
In our vericationmetho d, the width of the data path and the numb er of
registersand ALUop erationscan b eabstracted away.Asa result,one
verica-tionruncancheckthecontrollogicofpip elineswithanycombinationofvalues
fortheseparameters.Atotalof370millisecondsofCPUtime(runningcompiled
LucidCommonLISP ona DECstation5000/240)isrequired todo acomplete
verication run, including loading and compiling b ehavioral descriptions,
au-tomaticallyconstructing the abstraction function and relatedexpressions, and
checking the validity of the appropriate formula. The validity checking itself,
theprimaryb ottleneckonlargerexamples,onlyrequired50millisecondsforthe
pip elinedALU.
4.2 DLXPro cessor
Hennessy andPatterson[14] designedthe DLXarchitecture to teachthe basic
load word, unconditional jump, conditional branch (branch when the source
register is equal tozero), 3-register ALUinstructions,and ALUimmediate
in-structions.Aswiththepip elinedALUdescrib edearlier,thesp ecicsoftheALU
op erations are abstracted awayin b oth the sp ecication and the
implementa-tion.Thus,ourvericationcoversanysetofALUop erations,assumingthatthe
combinationalALUin thepro cessor hasb eenseparately veried.
Our DLXimplementationhasa standard 5-stagepip eline.The DLX
archi-tecture hasnobranchdelay slot;ourimplementationuses the\assume branch
nottaken"strategy.Nopip eliningisexp osedintheDLXarchitecture orinour
sp ecicationofit.Thus,itistheresp onsibilityoftheimplementationtoprovide
forwardingofdataanda loadinterlo ck.
Theinterlo ckandthelackofa branchdelayslotmeanthat thepip eline
ex-ecutes slightlylessthanone instructionp er cycle,onaverage.Thiscomplicates
\synchronizing" the implementation and the sp ecication during verication,
since the sp ecication executes exactly one instruction p er cycle. Weaddress
theprobleminamannersimilartothatusedbySaxeetal.[20].Theusermust
provide a predicate on the implementation states that indicates whether the
instruction to b e loaded onthe current cycle will actuallyb e executed by the
pip eline. While this predicate can b e quite complicated, it is easy to express
inourcontext,using internalsignals generated bytheimplementation.In
par-ticular,ourpip eline will notexecute the current instruction ifand only ifone
ormore of thefollowingconditionsholds:the stallinputis asserted,the signal
indicatinga takenbranchis asserted,or thesignal indicatingthat the pip eline
hasb eenstalledbytheloadinterlo ckisasserted.
When internal signals are used in this way, it is p ossible for bugs in the
pip elinetoleadtoa falsep ositivevericationresult.Inparticular,thepip eline
mayapp earcorrectevenifitcangetintoastatewhereitrefusestoeverexecute
anotherinstruction(akindoflivelo ck).Toavoidthep ossibilityofafalsep ositive,
we automatically check a progress condition that insures that livelockcannot
o ccur.TheCPUtimeneeded forthischeckisincludedinthetotalgivenb elow.
Oursp ecication hasfourstate variables:theprogramcounter,theregister
le,thedatamemoryandtheinstructionmemory.Ifthedatamemoryandthe
instruction memory are combined into one store in the sp ecication and the
implementation,then theverier willdetect that the pip elinedo es not satisfy
thesp ecicationforcertaintyp esofself-mo difyingco de(thishasb eenconrmed
exp erimentally).Separatingthetwostoresisonewaytoavoidthisinappropriate
negativeresult.
Foreachstatevariableofthesp ecication,theverier constructsan
appro-priate formula and checks its validity. Since neither the sp ecication nor the
implementationwrite to the instruction memory, checking the validity of the
corresp ondingformulaistrivial.Checkingtheformulasfortheprogramcounter,
thedatamemoryandthe registerlerequires 15.5seconds,34seconds and9.5
seconds of CPU time, resp ectively. The total CPU time required for the full
Inanothertest, weintro ducedabugin theforwardinglogicofthepip eline.
Theverierrequiredab out8 secondsto generate3 counter-examples,one each
forthethreeformulasthathadtob echecked.These counter-examplesprovided
sucient conditionson a initialimplementation state where the eects of the
bugwouldb e apparent. Thisinformationcan b e analyzedby hand,or used to
constructa startstatefora simulatorrun thatwouldexp osethebug.
5 Concluding Remarks
The need for improved debugging to ols is now obvious to everyone involved
in pro ducing a new pro cessor implementation. It is equally obvious that the
problem is worsening rapidly: driven by changesin semiconductor technology,
architecturesaremovingsteadilyfromthesimpleRISCmachinesofthe1980s
to-wardsverycomplexmachineswhichaggressivelyexploitconcurrencyforgreater
p erformance.
Althoughwehavedemonstratedthatthetechniquespresentedherecanverify
more complex pro cessors with much less eort than previous work, examples
suchas ourDLXimplementationarestillnotnearly ascomplexascommercial
micropro cessor designs. Wehavealso notyet dealt with memorysystems and
interrupts,whicharerichsourceofbugs inpractice.
It will b e very challenging to increase the capacity of vericationto ols as
quicklyas designers are increasing the scaleof the problem. Clearly, the
com-putationaleciency oflogicaldecisionpro cedures(inpractice,notintheworst
case) will b e a major b ottleneck. If decision pro cedures cannot b e extended
rapidlyenough,itmaystillb e p ossible touse some ofthe sametechniques for
partialvericationorina mixedsimulation/vericationto ol.
References
1. D.L.Beatty. AMethodologyfor FormalHardwareVerication,withApplication
to Microprocessors. PhD thesis, Scho ol of Computer Science, Carnegie Mellon
University,Aug.1993.
2. K.S.Brace,R.L. Rudell,and R.E.Bryant. Ecientimplementation ofaBDD
package. In27thACM/IEEEDesignAutomationConference,1990.
3. R.E. Bryant,D.L.Beatty,and C.-J.H. Seger. Formalhardwarevericationby
symb olicternary trajectoryevaluation. In28thACM/IEEEDesignAutomation
Conference,1991.
4. J.R.Burch,E.M.Clarke,andD.E.Long. Representingcircuitsmoreeciently
insymb olicmo delchecking.In28thACM/IEEEDesignAutomationConference,
1991.
5. J.R.Burch,E.M.Clarke,K.L.McMillan,andD.L.Dill. Sequentialcircuit
ver-icationusingsymb olicmo del checking. In27thACM/IEEEDesignAutomation
NineteenthAnnualACM Symposium on Principleson ProgrammingLanguages,
1992.
7. A.J.Cohn. Apro ofofcorrectnessoftheVip ermicropro cessors:Therstlevel.In
G.Birtwistleand P.A.Subrahmanyam,editors, VLSISpecication,Verication
andSynthesis,pages27{72.Kluwer,1988.
8. A.J. Cohn. Correctness prop erties of the Vip erblo ck mo del: Thesecond level.
In G.Birtwistle,editor, Proceedingsof the1988 DesignVericationConference.
Springer-Verlag,1989. AlsopublishedasUniversityofCambridgeComputer
Lab-oratoryTechnicalRep ortNo.134.
9. F.Corella. Automatedhigh-levelverication againstclo ckedalgorithmic sp
eci-cations. TechnicalRep ortRC18506,IBMResearchDivision,Nov.1992.
10. F.Corella. Automatic high-levelverication against clo ckedalgorithmic sp
eci-cations. InProceedingsof theIFIP WG10.2Conference onComputer Hardware
DescriptionLanguagesandtheirApplications,Ottawa,Canada,Apr. 1993.
Else-vierSciencePublishersB.V.
11. D.Cyrluk. Micropro cessor vericationinPVS:Ametho dologyandsimple
exam-ple. Technical Rep ort SRI-CSL-93-12, SRI Computer Science Lab oratory, Dec.
1993.
12. D.DetlefsandG.Nelson. Personalcommunication,1994.
13. J.L.Hennessy. Designingacomputerasamicropro cessor:Exp erienceandlessons
fromtheMIPS4000. AlectureattheSymp osiumonIntegratedSystems,Seattle,
Washington,March14,1993.
14. J.L. HennessyandD.A.Patterson. Computer Architecture:A Quantitative
Ap-proach.MorganKaufmann, 1990.
15. W.A. Hunt,Jr. FM8501: Averied micropro cessor. Technical Rep ort47,
Uni-versityofTexasatAustin,InstituteforComputingScience,Dec.1985.
16. J.Joyce, G.Birtwistle, and M.Gordon. Proving a computer correct in higher
orderlogic.TechnicalRep ort100,ComputerLab.,UniversityofCambridge,1986.
17. M.Langevin and E.Cerny. Vericationofpro cessor-likecircuits. In P.Prinetto
andP.Camurati,editors,AdvancedResearchWorkshoponCorrectHardware
De-signMethodologies,June1991.
18. G.Nelson and D.C. Opp en. Simplication byco op erating decision pro cedures.
ACMTrans.Prog.Lang.Syst.,1(2):245{257,Oct.1979.
19. A.W. Rosco e. Occam in the sp ecication and verication of micropro cessors.
PhilosophicalTransactionsoftheRoyalSocietyofLondon,SeriesA:Physical
Sci-encesandEngineering,339(1652):137{151 ,Apr.15,1992.
20. J.B.Saxe,S.J.Garland,J.V.Guttag,andJ.J.Horning.Usingtransformations
and verication incircuit design. Technical Rep ort78, DEC Systems Research
Center,Sept.1991.
21. R.E.Shostak. Apracticaldecisionpro cedureforarithmeticwithfunctionsymb ols.
J.ACM,26(2):351{360,Apr.1979.
22. M.Srivas and M.Bickford. Formal verication of a pip elined micropro cessor.
IEEESoftware,7(5):52{64,Sept.1990.
A Illustrative Example
In this app endix, we give a more detailed description of howwe veried the
gures4and5isgeneratedbysimplesyntactictransformationsfromthe
LISP-likedescriptionlanguagethatisusedwithourprototyp everier.Noticethatthe
pip eliningoftheALUiscompletelyabstracted awayinthesp ecication,but is
quiteexplicitinthedescriptionoftheimplementation.
The twob ehavioraldescriptions are automaticallycompiled intotransition
functions(gures 6and7). Thetransitionfunctionsareenco dedbyasso ciating
eachstatevariablewithanexpressionthatrepresen tshowthatstatevariableis
up datedeachcycle. Theexpressions arein thelogicofuninterpreted functions
and equality describ ed earlier. The concrete syntax for expressions in gures
6through10 is that used in our prototyp e verier, which is implemented in
LISP.
The verier automatically pro duces an expression for the appropriate
ab-straction function(gure8). Theabstraction functionand thetransition
func-tionsareused toautomaticallypro duceexpressions (gures9and10)that
cor-resp ond to the two sides of the commutative diagram in gure1. The
imple-mentationiscorrectifandonlyiftheseexpressionsareequivalent,whichcanb e
automaticallyveriedusingthevaliditycheckerdescrib ed earlier.
if stall
then regfile' := regfile;
else
regfile' :=
write(regfile, dest,
alu(op, read(regfile, src1)
read(regfile, src2)));
Fig.4.Sp ecicationforthepip elinedALUingure3.Primedvariablesrefertonext
statevalues;unprimedvariablesrefertocurrentstatevalues.Noticethatpip eliningis
then regfile' := regfile;
else regfile' := write(regfile, dest-wb, result);
bubble-wb' := bubble-ex;
dest-wb' := dest-ex;
result' := alu(op-ex, arg1, arg2);
bubble-ex' := stall;
dest-ex' := dest;
op-ex' := op;
if (not bubble-ex) and (dest-ex = src1)
then arg1' := result';
else arg1' := read(regfile', src1);
if (not bubble-ex) and (dest-ex = src2)
then arg2' := result';
else arg2' := read(regfile', src2);
Fig.5.Implementationofthepip elinedALU.Therstthreelinesofco desp ecifyhow
theregisterleisup datedbythenal(\writeback"or\wb")stageofthepip eline.The
nextthreelinessp ecifyhowthreepip eregistersareup dated bythesecond(\execute"
or \ex") stage. The remaining co de sp ecies how ve additional pip e registers are
up dated bytherststageof thepip eline.Noticethat arg1 isup dated withthenext
state(rather thanthecurrentstate)valueofeitherresultor regle;this isnecessary
fordataforwardingtoworkprop erlyinthe pip eline.
regfile: (ite stall
regfile (write regfile dest (alu op (read regfile src1) (read regfile src2))))
Fig.6.Sincetheregisterleistheonlystatevariableinthesp ecication,the
transi-tionfunctionhasonlyonecomp onent, whichisrepresented bytheab oveexpression.
ThisexpressionisautomaticallycompiledfromaLISP-likeversionofthepseudo-co de
regfile
(write regfile dest-wb result))
arg1: (ite (or bubble-ex
(not (= src1 dest-ex)))
(read
(ite bubble-wb
regfile
(write regfile dest-wb result))
src1)
(alu op-ex arg1 arg2))
Fig.7.Thetransitionfunctionforthepip elinehasninecomp onents,twoofwhichare
represented by the ab ove expressions. These expressions are automaticallycompiled
fromaLISP-likeversionofthepseudo-co de ingure5.
(ite bubble-ex
#1=(ite bubble-wb
regfile
(write regfile dest-wb result))
(write #1#
dest-ex
(alu op-ex arg1 arg2)))
Fig.8.Thisexpressionrepresentstheabstractionfunctionfromapip elinestatetoa
sp ecication state(only oneexpression isnecessary sincethe registerleis the only
comp onentofasp ecicationstate).The\#1#"notationdenotestheuseofasubterm
that wasdenedearlierinthe expressionbythe \#1="notation.This expressionis
computed automatically from the transition function of the pip eline; the user need
onlysp ecify howmany times to iterate the transition function to ush the pip eline
#1=(ite bubble-ex
#2=(ite bubble-wb
regfile
(write regfile dest-wb result))
(write #2#
dest-ex
(alu op-ex arg1 arg2)))
(write #1#
dest
(alu op
(read #1# src1)
(read #1# src2))))
Fig.9.Thisexpressionrepresentsthefunctionalcomp ositionoftheabstraction
func-tion and the transition function of the sp ecication. The verier can compute this
comp ositionautomatically.Theresultingfunctioncorresp ondstothesp ecicationside
ofthe commutativediagramingure1.
(ite stall
#1=(ite bubble-ex
#2=(ite bubble-wb
regfile
(write regfile dest-wb result))
(write #2#
dest-ex
#3=(alu op-ex arg1 arg2)))
(write #1#
dest
(alu op
(ite (or bubble-ex
(not (= src1 dest-ex)))
(read #2# src1)
#3#)
(ite (or bubble-ex
(not (= dest-ex src2)))
(read #2# src2)
#3#))))
Fig.10.Thisexpressionrepresentsthefunctionalcomp ositionofthetransition
func-tionoftheimplementationandtheabstractionfunction.Theveriercancomputethis
comp ositionautomatically.Theresultingfunctioncorresp ondsto theimplementation