Bridging the Gap between Source and High-Level Models 3
Gail C.Murphy and David Notkin KevinSullivan
Dept. ofComputer Science &Engineering Dept. of Computer Science
University ofWashington Universityof Virginia
Box 352350 CharlottesvilleVA, USA 22903
Seattle WA, USA98195 -23 50 [email protected]
fgmurphy, [email protected]
Abstract
Softwareengineersoftenusehigh-levelmo dels(for
in-stance, b ox and arrowsketches) to reason and
com-municate ab out an existing software system. One
problem with high-level mo dels is that they are
al-mostalwaysinaccurate with resp ec t tothe system's
source co de. We have develop ed an approach that
helpsanengineeruseahigh-levelmo delofthe
struc-tureofanexistingsoftwaresystemasalensthrough
whichtoseeamo delofthatsystem'ssourceco de. In
particular,anengineerdenesahigh-levelmo deland
sp ecie showthe mo delmaps to the source. A to ol
thencomputesasoftwarereexionmo delthatshows
wheretheengineer'shigh-levelmo delagreeswithand
whereitdiers fromamo delofthesource.
Thepap erprovidesaformalcharacterizationof
re-exionmo dels,discusse s practicalasp ects ofthe
ap-proach, and relates exp erience s of applying the
ap-proach and to ols to a numb er of dierent systems.
Theillustrativeexampleusedinthepap er describ es
theapplicationofreexionmo delstoNetBSD,an
im-plementationofUnixcomprisedof250,000linesofC
co de. Inonlyafewhours,anengineercomputed
sev-eralreexionmo delsthatprovidedhimwithauseful,
globaloverview of the structur e of theNetBSD
vir-tualmemorysubsystem. Theapproachhasalsob een
appliedto aid inthe understanding and exp
erimen-talreengineering of theMicrosoft Excel spreadshe et
pro duct.
3
Thisresearch wasfunded inpart by theNSF grant
CCR-8858804 and a Canadian NSERC p ost-graduate
scholarship.
0
Permissiontomakedigital/hardcopiesofallorpartofthis
mate-rialwithoutfeeisgrantedprovidedthatthecopiesarenotmadeor
dis-tributedforprotorcommercialadvantage,theACMcopyright/server
notice,thetitleofthepublicationanditsdateapp ear,andnoticeis
giventhatcopyright isbyp ermissionofthe Asso ciationfor
Comput-ingMachinery,Inc. (ACM).Tocopyotherwise,torepublish,top ost
onserversortoredistributetolists,requirespriorsp ecicp ermission
and/orafee.
SIGSOFT'95Washington,D.C.,USA
1 Introduction
Softwareengineersoftenthinkab out anexisting
software system in terms of high-level mo dels.
Boxandarrowsketchesofasystem,forinstance,
are often found on engineers' whiteb oards.
Al-though these mo dels are commonly used,
rea-soningab outthesystemintermsofsuchmo dels
can b edangerous b ecausethemo delsarealmost
always inaccurate with resp ect to the system's
source.
Current reverse engineering systems derive
high-level mo dels from the source co de. These
derived mo dels are useful b ecause they are, by
theirverynature,accuraterepresentationsofthe
source. Although accurate, the mo dels created
by these reverse engineering systems may dier
fromthemo delssketchedbyengineers;an
exam-pleofthisisrep ortedbyWongetal.[WTMS95].
Wehavedevelop edanapproach,illustratedin
Figure 1, that enables an engineer to pro duce
sucientlyaccuratehigh-levelmo delsina
dier-entway. The engineerdenes ahigh-levelmo del
of interest, extracts a source mo del (such as a
call graphoran inheritance hierarchy)from the
source co de, and denes a declarative mapping
b etween the two mo dels. A software reexion
model is thencomputed to determinewhere the
engineer's high-level mo del do es and do es not
agree with the source mo del. 1
An engineer
in-terprets the reexion mo del and, as necessary,
mo dies the input to iteratively compute
addi-tional reexion mo dels.
1
\re-Produces
Input
Input
Input
Produces
Defines
Input
Interpreted By
Reflexion
Model
Tools
Extraction
Tool
Defines
High−level Model
Mapping
Source
Code
Source
Model
Reflexion Model
Figure 1: TheReexion Mo delApproach
In essence, a reexion mo del summarizes a
sourcemo delofasoftwaresystemfromthe
view-p ointofaparticularhigh-levelmo del. Thisform
ofsummarizationisuseful toengineersp
erform-ing a variety of software engineering tasks. In
one case, an engineer atMicrosoft Corp oration,
priortop erformingareengineeringtask,applied
reexion mo dels to help understand the
struc-ture of the Excel spreadsheet pro duct. In
an-other case, computing a sequence of reexion
mo delshighlightedseveralplaceswherecallsb
e-tweenmo dulesviolatedthelayersofthesoftware
architecture that an engineer had p erceived to
exist.
Wehavedevelop edseveraltechniquesfor
sim-plifyingtheengineer'staskof deninghigh-level
mo dels and mappings. We have also develop ed
to olsthatcomputereexionmo delsforlarge
sys-tems in a minute or two. As describ ed in
Sec-tion 2, these techniques and to olshave made it
p ossible for an engineer, in a few hours, tob
et-ter understand a system of several hundreds of
thousands oflinesof co de.
Toclarifythemeaningand computationof
re-exion mo dels, we present a formal
characteri-zation of a reexion mo del system in Section3.
The exibility of our approach is demonstrated
considers the theoretical and practical p
erfor-mance asp ect of theapproach and our to ols. In
Section6,wediscusskeyasp ectsoftheapproach
thatprovidesoftwareengineerswiththe needed
exibility. Section7 considersrelated work.
2 An Example
To convey our basicapproach,we describ e how
adevelop erwithexp ertiseinUnixvirtual
mem-ory (VM) systems used reexion mo dels to
fa-miliarizehimselfwithanunfamiliar
implementa-tion,NetBSD.The systemiscomp osed of ab out
250,000linesofC[KR78]sourceco despreadover
approximately 1900sourceles.
The develop er rst sp ecied a mo del he b
e-lieved,basedonhisexp erience,tob e
characteris-ticofUnixvirtualmemorysystems. Thismo del
consists of a mo dularization of a virtual
mem-ory implementation and thecalls b etween those
mo dules (Figure2a).
A calls relation b etween NetBSD functions
was then computed using the cross-reference
database to ol (xrefdb) of Field [Rei95], and a
smallawk[AKW79]script waswrittento
trans-late the output of Field into our input format.
The extracted calls relation consisted of over
15,000 tuplesover3,000sourceentities. 2
Next,thedevelop erdenedthefollowing
map-ping: [ file=.*pager.* mapTo=Pager ] [ file=vm_map.* mapTo=VirtAddressMaint ] [ file=vm_fault\.c mapTo=KernelFaultHdler ] [ dir=[un]fs mapTo=FileSystem ] [ dir=sparc/mem.* mapTo=Memory ] [ file=pmap.* mapTo=HardwareTrans ] [ file=vm_pageout\.c mapTo=VMPolicy ]
Each linein thisdeclarative map asso ciates
en-titiesinthesourcemo del(ontheleft)with
enti-ties in the high-level mo del (on the right). The
seeming diculty of dening a mapping given
the thousands of entities in theNetBSD source
2
Eachtuplecontainedthenameofthecallingfunction,
neer onlynamed entitiesinsubsystems of
inter-est. For example, the mapping ab ove do es not
consider entities from the I/O subsystem.
Sec-ond, the physical (e.g., directory and le) and
logical (e.g., functions and classes) structure of
thesourceare usedtonamemanysourcemo del
entities in a single line of the mapping. In this
case,onlyphysicalstructureisusedsinceC
pro-vides littlelogicalstructure. Finally,regular
ex-pressions are used to obviate the need to
enu-merate a large set of structures. For instance,
therstlineofthemappingstatesthatall
func-tionsfoundwithinleswhosenameincludes the
stringpagershould b easso ciatedwiththe
high-levelmo del entityPager.
Giventhesethreeinputs,areexionmo delwas
computed to compare the source and high-level
mo dels(Figure2b). Thesolidlinesshowthe
con-vergences, where the source mo del agrees with
the high-level mo del. For instance, as the
de-velop er exp ected, there were calls found in the
sourceb etweenfunctionsinmo dules
implement-ing VMPolicy and functions in mo dules
imple-menting Pager. The dashed arrows show the
divergences, where the source mo del includes
arcs not predicted by the high-level mo del. For
instance, the dashed line from FileSystem to
Pager indicates that functions within mo dules
mapp ed to FileSystem make calls to functions
within mo dules mapp ed to Pager. The dotted
linesshowtheabsences,wherethesourcemo del
do esnotincludearcs predictedbythehigh-level
mo del. For instance, no calls were found b
e-tween mo dules mapp ed to Pager and mo dules
mapp edtoFileSystem. Thenumb er asso ciated
witheacharcinthegureisthenumb erofsource
mo delrelationvaluesmapp edtotheconvergence
or divergence; absence arcs are annotated with
thevaluezero.
Computation and display of the reexion
mo del shown in Figure 2 takes twenty seconds
on a DEC 3000/300 with our to ols, which
con-sistofseveralsmallC++[Str86]programsanda
userinterfaceimplementedinTCL/TK[Ous94].
Computed reexion mo dels are displayed using
AT&T's graphviz package. The to ols allow the
tuples. ThescreensnapshotinFigure3includes
a window \Arc Values" that shows the result
of selecting thedivergenceb etween FileSystem
and Pager. Valuesinthiswindowshowthe
call-ingandcalledfunctions,includingtheirdirectory
and leinformation.
In a one hoursession, the VM develop er was
abletoiterativelysp ecify,computeandinterpret
severalreexion mo dels. (Thesourcemo del was
extracted b eforehand, and it was not changed
during this session.) Informally, the develop er
found the representation of the source co de as
a reexion mo del useful in providing a global
overview of the structure of the NetBSD
im-plementation. For example, from studying sp
e-cic divergences, the develop er concluded that
the implementationof FileSystemincluded
op-timizationsthatrelyoninformationfromPager.
Thisinformationisuseful forplanningmo
dica-tions toeithermo dule.
3 Forma l Characterization
To make precise the meaning and computation
of a reexion mo del, we present a formal sp
eci-cation of a reexion mo del system using the Z
sp ecication language[Spi92].
3.1 Reexion Model
A static schema in Z describ es a state the
sys-tem can o ccupy as well as invariantsthat must
b e maintainedacross statetransitions. Any
sys-tem implementing the reexion mo del approach
must maintain the state comp onents and
in-variantsdescrib edintheReexionModel schema
presented b elow.
The ReexionModel schema uses two basic
typ es, HLMENTITY,whichrepresentsthetyp e
of a high-level mo del entity and SMENTITY,
which represents the typ e of a source mo del
entity. Four typ e synonyms are also dened:
HLMRelation and SMRelation, which dene
re-lationsover high-levelmo del entitiesand source
mo del entities resp ectively, and HLMTuple and
Memory
Pager
FileSystem
User
VMPolicy
HardwareTrans
KernelFaultHdler
VirtAddressMaint
Memory
HardwareTrans
0
0
51
VMPolicy
6
KernelFaultHdler
0
6
3
Pager
3
8
2
VirtAddressMaint
2
2
User
0
0
0
10
4
0
61
7
6
3
21
FileSystem
0
16
9
11
575
Module
Calls
Module
Absence
Convergence
Divergence
(a) High−level Model
(b) Reflexion Model
[HLMENTITY;SMENTITY]
HLMRelation ==HLMENTITY $HLMENTITY
SMRelation==SMENTITY $SMENTITY
HLMTuple==HLMENTITY2HLMENTITY
SMTuple==SMENTITY 2SMENTITY
ReexionModel
convergences :HLMRelation
divergences:HLMRelation
absences :HLMRelation
mappedSourceModel :HLMTuple $SMTuple
convergences\divergences=?
divergences\absences=?
convergences\absences=?
dommappedSourceModel =
convergences[divergences
The variables (ab ove the dividingline) in the
schema dene the information that must b e
maintained by a reexion mo del system. The
relation describ ed by the convergences variable
denes where a high-level mo del agrees with
by the divergences variable describ es where a
source mo del diers from a high-level mo del;
and therelation describ edbytheabsences v
ari-able denes where a high-level mo del diers
from a source mo del. The fourth variable,
mappedSourceModel, denes a relation that
de-scrib es which source mo del values contribute to
a convergent or divergent arc in the reexion
mo del. The information inmappedSourceModel
is used to supp ort op erations that aid an
engi-neer in interpreting a reexion mo del. For
ex-ample,theinformationinmappedSourceModel is
necessaryto supp ort a queryof the form shown
inFigure 3.
Inaddition todeclaringthestatecomp onents
of a reexion mo del, the staticschema includes
the denition of constraints (b elow the
divid-ing line) that must b e satised by all reexion
mo del systems. The rst three constraintsstate
that thevalues of the convergences , divergences
and absences relations are disjoint. The fourth
constraint states that the investigation of the
have nocontributing sourcemo del values.
3.2 Computing a Reexion Model
The dynamic schema, ComputeReexionModel,
presentedb elow,describ es thecomputationof a
reexion mo del from three inputs: a high-level
mo del, a source mo del, and a mapping from
the source to the high-level mo del. The
high-levelmo del(hlm?)isdescrib edasarelationover
high-level mo del entities,and the source mo del
(sm?)isdescrib edasarelationoversourcemo del
entities. The mapping (map?) is an ordered
list of map entries. Each map entry|dened
by the typ e MapEntry|names zero or more
source mo del entities and asso ciates with these
entities, one or more high-level mo del entities. 3
SMENTITYDESC represents the typ e of a
de-scription namingzero or moresourcemo del
en-tities(e.g., aregular expression over logicaland
physical softwarestructureinourto ols).
[SMENTITYDESC] MapEntry == SMENTITYDESC2(PHLMENTITY) ComputeReexionMod el 1ReexionModel hlm?:HLMRelation sm?:SMRelation
map?:seqMapEntry
mapFunc :(seqMapEntry2SMENTITY)!
(PHLMENTITY) mappedSourceModel 0 = S ft : SMTuple j t 2 sm? (mapFunc(map?;rstt)2
mapFunc(map?;secondt))2
ftgg convergences 0 = hlm? \(dommappedSourceModel 0 ) divergences 0 = (dommappedSourceModel 0 ) nhlm? absences 0 =hlm? n (dommappedSourceModel 0 ) 3
Anadditionalinvariantwhichconstrainsamapentry
to name onlyhigh-level mo del entities sp eciedin hlm?
function (mapFunc) that matches entities from
thesourcemo deltothesp eciedmap,pro ducing
aset ofasso ciatedhigh-levelmo del entities.
ThevalueofmappedSourceModel iscomputed
by pushing the elements of each source mo del
tuple through the map, resulting in two sets of
high-level mo del entities. The cross-pro duct of
these sets is taken and each element in the
re-sultant setis asso ciated(through another
cross-pro duct) with the original source mo del tuple.
Once the mappedSourceModel is computed, the
values of the convergences , divergences, and
absences relationsareeasilydeterminedthrough
set intersection and setdierence op erations.
3.3 A Family of Reexion Models
This Z sp ecication denes a family of
reex-ion mo del systems. Dierent kinds of reexion
mo del systems result dep ending on the choices
madeforrepresentingthesourcemo delentity
de-scriptionsinamap(SMENTITYDESC)andfor
dening the mapping function(mapFunc). Our
reexion mo del system describ es source mo del
entities using a combination of structural
infor-mation and regular expressions. Our to ols
sup-p ortthe useof twodierent mappingfunctions.
The most common mapping function used
pro-duces theset of high-levelmo del entities asso
ci-atedwiththerstmatchofagivensourcemo del
entitytoanentryinthemap. Alternatively,our
to olssupp orttheuseofamappingfunctionthat
returnsthesetofhigh-levelmo delentities
result-ing from the union of all matches found in the
map. Ourto olscanthusb econguredtoprovide
implementationsoftwodierentreexionmo del
systems.
4 Experience
The reexion mo del approach has b een applied
to aid engineers inp erforming a variety of
soft-ware engineering tasks on a numb er of dierent
implemen-crosoft Corp oration applied reexion mo dels to
assess the structure of the Excel spreadsheet
pro duct prior to a reengineering activity. The
Excel pro duct consists of over one millionlines
ofCsourceco de. Theengineercomputed
reex-ion mo delsseveraltimesa dayover afour week
p erio dtoinvestigatethecorresp ondenceb etween
ap ositedhigh-levelmo delandamo delofalmost
120,000callsandglobaldatareferencesextracted
from the source. A detailed mapping le
con-sistingof almost1000 entrieswaspro duced and
is b eing used to guide the extraction of comp
o-nents from the source. The engineer found the
approach valuable for understanding the
struc-ture of Exceland planning thereengineering
ef-fort.
DesignConformance Weusedasequenceof
reexion mo dels to compare the layered
archi-tecturaldesignofGriswold'sprogram
restructur-ing to ol [GN95] with a source mo del consisting
of calls b etween mo dules. The reexion mo del
highlighted, as divergences, a few cases where
mo dulesinthesourceco dedidnotadheretothe
layeringprinciples. Weareunawareofanyother
approachthat would allowsuch violationstob e
found sodirectly.
An industrial partner applied reexion mo
d-els tocheckifa6,000lineC++ implementation
of a subsystem matched design do cumentation
(intheformofaBo o chobjectdiagram[Bo o91])
preparedpriortoimplementation. Thiscasewas
uniqueinthatthereexion mo delwasfully
con-vergent withthesourcemo del.
System Understanding We used reexion
mo delstotrytodetermine whya compilerused
in undergraduate education at the University
of Washington was dicult for the students to
change. We computed a reexion mo del
com-paring an extracted Ada le imp orts relation
withaconventionalmo delofacompiler[PW92].
Thereexionmo delcontainedmeaningful
diver-gences b etween almostallpairsofhigh-level
en-tities. This high degree of coupling explains, in
part, why the students had diculty changing
version of the compiler, written in C++. The
reexion mo dels for this compiler were far less
cluttered than the Ada version. However, some
unexp ected interactionswereidentiedusingthe
reexion mo del;these mayprovidethe basisfor
either minor restructuring of thecompiler orat
least additionalwarningstothestudents.
5 Performance
Engineers iteratively sp ecify, compute, and
in-terpret reexion mo dels. The rate at which an
engineercan interpretanditeratereexionmo
d-els is dep endent, in part, up onthe the sp eed of
the computation.
The formal characterization presentedin
Sec-tion 3 provides a basis for considering the
theoretical complexity of computing a
reex-ion mo del. From the dynamic Z schema,
ComputeReexionModel, we can see that the
time complexityof computinga reexionmo del
is dep endent up on the cost of computing the
mappedSourceModel relation and the cost of
comparing that computed relation to the
high-levelmo del. Anupp er b ound,then,on the
com-plexityofcomputingthemappedSourceModel
re-lation isgivenby:
O(#sm 2 #map 2t
comparison
) 2O((#hlm) 2
)
where#sm isthecardinalityofthesourcemo del
relation,#map isnumb er of entries inthemap,
t
comparison
is the cost of comparing a source
mo del entity to a source mo del entity
descrip-tion ina mapentry,and #hlmisthecardinality
of the high-levelmo del relation. Since the
num-b er of entities in the high-level mo del is
gener-ally smalland constant,theO((#hlm) 2
) can,in
practice, b eignored,yielding:
O(#sm 2 #map 2 t
comparison )
Our initial implementation of to ols for
com-puting reexion mo dels p erformed the
straightfor-ciently fast for mo derately large systems (with
sourcemo delsconsistingof tens of thousandsof
tuples) and small maps (tens of lines), but was
notfast enough tosupp ort theiterative
compu-tation of reexion mo dels for larger systems or
larger maps. For example, an early version of
our to ols required 40 minutes on a Pentium to
computea reexion mo del forExcel.
By trading space for time inthe
implementa-tion of our to ols, we have b een able to supp ort
the computation of reexion mo dels for large
software systemsand large maps ina minute or
two (see Table 1). Sp ecically, our to ols hash
thematchofhigh-levelmo delentitiesforagiven
sourcemo delentitythersttimeasourcemo del
entityisseen. Theadditionalspacerequirements
dep endup onthenamingschemeusedforsource
mo delentitiesand thenumb erofuniqueentities
in the source mo del. In the case of Excel and
the naming scheme used byour to ols, there are
18,118uniquesourcemo del entitieseach
requir-ingon theorder of 100bytes (lessthan2 Mb in
total). Our to olsprovidea varietyof options to
lettheengineerdeterminetheappropriatespace
timetradeo.
6 Discussion
Reexionmo delsp ermitanengineertoeasily
ex-plore structural asp ects of a large software
sys-tem. Thegoaloftheapproachistoprovide
engi-neerswiththeexibilitytopro duce, atlow-cost,
high-level mo dels that are \go o d enough" for
p erformingaparticularsoftwareengineeringtask
(restructuring, reengineering, or p orting, etc.).
Three asp ects of the approach critical to
meet-ingthisgoalaretheuseofsyntacticmo dels, the
useof expressive declarativemaps, and supp ort
forqueryinga reexion mo del.
Syntactic Mo dels Asdescrib edintheformal
characterization,reexion mo delsare computed
withoutanyknowledgeoftheintendedsemantics
ofthehigh-levelorthesourcemo del. Ab enetof
thissyntacticapproach isthe abilityof an
engi-ware system (calls, data dep enden ces, or event
interactions,etc.). It also means, however, that
it is the engineer's resp onsibilityto ensure that
it makes sense to compare a selected high-level
mo del with an extracted source mo del. For
ex-ample, comparing the sp ecied calls diagram of
Figure 2awithanextractedcallsrelationmakes
sense, but comparingthe samehigh-levelmo del
with a source mo del representing the #include
structure of NetBSD would probably b e
mean-ingless.
Inpractice,engineers haveexploitedthis
ex-ibility by changing the meaning of their mo
d-els over time. For example, b oth the VM
de-velop er and the Microsoft engineer rst used
a calls source mo del and later augmented the
source mo dels with static dep endence s of
func-tion denitionsonglobaldata. Byadding static
dep ende nces, b oth develop ers implicitly shifted
their high-levelmo del from a callsdiagramto a
communicates-withdiagram. In b oth cases, the
changes weredriven by theneed tounderstand,
for thetaskb eingp erformed, additionalasp ects
of thesystem structure.
To aid the engineer in interpreting reexion
mo delscomputed withmo delscontaining
dier-ent kinds of information,we are addingsupp ort
fortypingrelationsinb othsourceandhigh-level
mo dels.
Maps The declarative maps used in
comput-ingareexionmo delenableanengineertofo cus
on information of interest in the source in two
ways. First, an engineer may sp ecify a partial
mapthat containsentriesforonlythose partsof
thesystemrelevant tothetaskathand. Second,
an engineer may iterativelyrene a map to the
appropriate levelofdetail necessaryforthetask
b eing p erformed.
Generally, an initial reexion mo del is
com-puted with a rough and partial map. Then,
basedon aninvestigationof thereexion mo del,
an engineer renes the map in the areas of
in-terest untilthenecessary informationab out the
system is obtained. Sometimes, as was thecase
Linesof Mo del Mo del 100MHz
Co de (Tuples) (Lines) (Entities) (min:sec) (min:sec)
UW compiler C++ 3,700 607 33 5 :00.5 :01 UW compiler Ada 4,200 72 9 5 :00.2 :01 Restructur ing To ol CLOS 47,000 5,855 215 9 :04.3 :06 NetBSD (VM) C 250,000 15,657 7 8 :04.0 :19 Excel C 1,200,000 119,637 971 15 2:13.0 4:05
Table1: Performanceof ReexionMo del To ols
fairly rough map was sucient. In contrast, in
thecase ofExcel,a detailedmapwasdesired to
planreengineering activities.
The reexion mo del approach enables an
en-gineer to balance the cost of rening the map
with thelevelof detail necessary forp erforming
a particularsoftware engineering task. We plan
totracktheuseofsomedeclarativemapsacross
theevolutionofseveralsystemstodeterminethe
degree ofsensitivityofourmappinglanguageto
changesmadeinthesource. Thiswillaidan
en-gineer in judging the amortization costs of
cre-atingdetailedmaps forlargesystems.
QueryingReexionMo dels Reexionmo
d-els bridge the gap b etween an engineer's
high-level mo del and a mo del of the source. The
convergences,divergences,andabsences
summa-rize selected interactions in the source, while
the mappedSourceModel captures the
connec-tions b etween the high-level and source mo del
arcs. Based on the summary information, the
engineer intersp ersestwo kindsof queries to
in-terpret a reexion mo del for a sp ecic software
engineeringtask.
In the rstkind of query, an engineer
investi-gates thesource mo del values contributing to a
convergence or divergence. In the second kind,
an engineer p erforms queries to determine the
source mo del entities and values that were not
included in thereexion mo del. This query
en-ables an engineer to assess whether the map is
absencesinthecomputedreexionmo delarethe
result of incompletenessinthemap.
Based on theresults of the interpretation,an
engineer mayeitherdecide torene one ormore
of the inputs, computing a subsequent reexion
mo del,orelsemaydecidethatsucient
informa-tion hasb eenobtainedtopro ceedwiththe
over-alltask. Tob ettersupp ortanengineerinthe
in-terpretationpro cess,wearecurrentlydeveloping
techniques to improve thequerying and
investi-gationof a seriesofcomputed reexion mo dels.
7 Related Work
Reverse Engineering The reverse
engineer-ing approaches closestto ours use clustering
in-formation,whichisgenerally culledfroma
com-bination of human input and numerical
com-putation, to create abstract representations for
theengineer. Examples ofthisapproachinclude
Rigi [MK89] and Schwanke's statistically-based
architectural recoverytechnique [Sch91].
Reexion mo dels dier in a numb er of ways.
First,in ourapproach theengineer sp ecies the
high-level entities explicitly, whereas the
archi-tectural recovery systems instead infer them.
Second, we fo cus on comparing high-level and
source mo dels, ratherthan on discovering
high-level mo dels. Third, our mappings are
declara-tive,asso ciatingsourceandhigh-levelentities,in
engineer to build explicit hierarchicalmo dels of
the software structure of a system. We instead
supp ort the investigation of substructure (i.e.,
a subsystem) by computing dierent reexion
mo delsatvariouslevels of abstraction.
Mo del Comparison Ossher has considered
the comparison of relational mo dels of software
structure for a xed typ e of high-level mo del, a
GRID[Oss84]. TheintentofaGRIDdescription
of a system is to \sp ecify, represent, do cument
and enforce the structure of large, layered
sys-tems" [Oss87, pg. 219]. The GRID mechanism
p ermits an engineertocho osea concise
descrip-tion ofthesoftwarestructureand thento
anno-tate how the actual structure of the system
de-viatesfromthedescription. Ourapproachisnot
intendedtosp ecifyorenforceadesiredstructure
ofthesystem,butrathertocomparetwomo dels
of the structure using an explicit mapping b
e-tween the twomo delsprovidedby theengineer.
Jackson's Asp ect system [Jac93] compares
partial program sp ecications (high-level mo
d-els) to data ow mo dels extracted from the
source to detect bugs in the source co de that
cannot b e detected using static typ e checking.
Asp ectuses dep endenc es b etween datastores as
a mo del of the b ehavior of a system and
as-sumesthatthismo deliscorrect,fo cusingonhow
thesourcemo del diersfromthep ositedb
ehav-ior. Incontrast,wefo cusonhigh-levelstructural
mo dels, and we are interested in b oth how the
source mo del diers from the high-level mo del
and also how the high-level mo del diers from
thesource.
Jackson has also develop ed a semantic
dier-ence approach for comparing the dierences in
inputand outputb ehaviorb etween twoversions
of a pro cedure [JL94]. This approach derives
an approximate mo del of the semantic eect of
a pro cedure consistingof a binary relation that
summarizesthedep ende nceofvariablesafter
ex-ecutingthepro cedureup onthevalueofvariables
at the entry to the pro cedure. The semantic
dierencing to ol compares the binary relations
resulting from dierent versionsof a pro cedure.
declarativemapping.
8 Summary
A reexion mo del summarizes information
ex-tracted fromsourceco deintoahigh-levelmo del
that is suciently accurate to supp ort an
engi-neer in p erforming a software engineering task.
The engineer denes three inputs to a reexion
mo delcomputation: ahigh-levelmo del,asource
mo del,and amap. Thereexion mo delpresents
the summary information in the context of the
high-level mo del dened by the engineer. The
engineerinterpretsanditerativelycomputes
suc-cessive reexion mo delsuntil satised. The
fea-sibilityandexibilityof theapproachhaveb een
demonstrated throughitsapplicationin a
num-b er ofdierentsettingson systemsrangingfrom
severalthousandtooveronemillionlinesofco de.
Acknowledgments
Rob ert Allen, Kingsum Chow, David Garlan,
BillGriswold,MichaelJackson, KurtPartridge,
Bob Schwanke, and Michael VanHilst each
pro-vided helpful comments on earlier drafts of
the pap er. Conversations with Daniel Jackson
help ed clarifya numb er of asp ects of our work.
AnanonymousMicrosoftengineerapplied
reex-ion mo dels to Excel. Dylan McNamee was our
VMdevelop er. PokWongappliedreexionmo
d-elsaspartofadesignconformancetask.Stephen
NorthofAT&Tprovidedthegraphvizgraph
dis-play and editing package. We also thank the
anonymous referees for their constructive
com-ments.
References
[AKW79] A.V. Aho, B.W. Kernighan, and P.J.
Weinb erger. Awk { A Pattern
Scan-ningandPro cessing Language. Software
{Practiceand Experience,9(4):267{280,
1979.
[Bo o91] G. Bo o ch. Object-oriented Design
turalTradeosforaMeaning-Preserving
Program Restructuring To ol. IEEE
Transactions on Software Engineering,
21(4):275{287,April1995.
[Jac93] D.Jackson. Abstract Analysis with
As-p ect. In Proceedings of the 1993
Inter-nationalSymposiumonSoftwareTesting
and Analysis,pages19{27,1993.
[JL94] D. Jackson and D.A. Ladd. Semantic
Di: A To ol For Summarizing the
Ef-fects ofMo dications. In Proceedings of
theInternationalConferenceonSoftware
Maintenance,Septemb er 1994.
[KR78] B.KernighanandD.Ritchie.TheCPr
o-grammingLanguage.PrenticeHall,1978.
[MK89] H.A.M uller and K. Klashinsky. A
Sys-tem for Programming-in-the-large. In
Proceedings of the 10th International
Conference on Software Engineering,
pages 80{86. IEEE Computer So ciety
Press,April1989.
[Oss84] H.L.Ossher.ANewProgramStructuring
Mechanism Based on Layered Graphs.
PhDthesis,StanfordUniversity,
Decem-b er1984.
[Oss87] H. Ossher. A Mechanism for Sp
eci-fying the Structure of Large, Layered
Systems. In Bruce Shriver and
Pe-terWegner,editors,Research Directions
in Object-Oriented Programming, pages
219{252.MITPress,1987.
[Ous94] J.K.Ousterhout.TCL&theTKToolkit.
Addison-Wesley,1994.
[PW92] D.E. Perry and A. Wolf.
Founda-tionsfortheStudyofSoftware
Architec-ture. ACM SoftwareEngineering Notes,
17(4):40{52,Octob er 1992.
[Rei95] S.P. Reiss. The FieldProgramming
En-vironment: A Friendly Integrated
Envi-ronment for Learning and Development.
KluwerAcademic Publishers,1995.
[Sch91] R.Schwanke.An IntelligentTo olfor
Re-engineeringSoftwareMo dularity.In
Pro-ceedings of the 13th International
Con-ference on Software Engineering, pages
83{92,May1991.
[Smi84] B.C. Smith. Reection and Semantics
guages Conference, pages 23{35. ACM,
Decemb er 1984.
[Spi92] J.M. Spivey. The Z Notation. Prentice
Hall,second editionedition,1992.
[Str86] B. Stroustrup. C++Programming
Lan-guage. Addison-Wesley,1986.
[WTMS95] K. Wong, S.R. Tilley, H.A.M uller, and
M.D. Storey. Structural Redo
cumenta-tion: A Case Study. IEEE Software,