• No results found

Software Reexion Models: Bridging the Gap between Source and High-Level Models 3. Box Charlottesville VA, USA

N/A
N/A
Protected

Academic year: 2021

Share "Software Reexion Models: Bridging the Gap between Source and High-Level Models 3. Box Charlottesville VA, USA"

Copied!
11
0
0

Loading.... (view fulltext now)

Full text

(1)

Bridging the Gap between Source and High-Level Models 3

Gail C.Murphy and David Notkin KevinSullivan

Dept. ofComputer Science &Engineering Dept. of Computer Science

University ofWashington Universityof Virginia

Box 352350 CharlottesvilleVA, USA 22903

Seattle WA, USA98195 -23 50 [email protected]

fgmurphy, [email protected]

Abstract

Softwareengineersoftenusehigh-levelmo dels(for

in-stance, b ox and arrowsketches) to reason and

com-municate ab out an existing software system. One

problem with high-level mo dels is that they are

al-mostalwaysinaccurate with resp ec t tothe system's

source co de. We have develop ed an approach that

helpsanengineeruseahigh-levelmo delofthe

struc-tureofanexistingsoftwaresystemasalensthrough

whichtoseeamo delofthatsystem'ssourceco de. In

particular,anengineerdenesahigh-levelmo deland

sp ecie showthe mo delmaps to the source. A to ol

thencomputesasoftwarereexionmo delthatshows

wheretheengineer'shigh-levelmo delagreeswithand

whereitdiers fromamo delofthesource.

Thepap erprovidesaformalcharacterizationof

re-exionmo dels,discusse s practicalasp ects ofthe

ap-proach, and relates exp erience s of applying the

ap-proach and to ols to a numb er of dierent systems.

Theillustrativeexampleusedinthepap er describ es

theapplicationofreexionmo delstoNetBSD,an

im-plementationofUnixcomprisedof250,000linesofC

co de. Inonlyafewhours,anengineercomputed

sev-eralreexionmo delsthatprovidedhimwithauseful,

globaloverview of the structur e of theNetBSD

vir-tualmemorysubsystem. Theapproachhasalsob een

appliedto aid inthe understanding and exp

erimen-talreengineering of theMicrosoft Excel spreadshe et

pro duct.

3

Thisresearch wasfunded inpart by theNSF grant

CCR-8858804 and a Canadian NSERC p ost-graduate

scholarship.

0

Permissiontomakedigital/hardcopiesofallorpartofthis

mate-rialwithoutfeeisgrantedprovidedthatthecopiesarenotmadeor

dis-tributedforprotorcommercialadvantage,theACMcopyright/server

notice,thetitleofthepublicationanditsdateapp ear,andnoticeis

giventhatcopyright isbyp ermissionofthe Asso ciationfor

Comput-ingMachinery,Inc. (ACM).Tocopyotherwise,torepublish,top ost

onserversortoredistributetolists,requirespriorsp ecicp ermission

and/orafee.

SIGSOFT'95Washington,D.C.,USA

1 Introduction

Softwareengineersoftenthinkab out anexisting

software system in terms of high-level mo dels.

Boxandarrowsketchesofasystem,forinstance,

are often found on engineers' whiteb oards.

Al-though these mo dels are commonly used,

rea-soningab outthesystemintermsofsuchmo dels

can b edangerous b ecausethemo delsarealmost

always inaccurate with resp ect to the system's

source.

Current reverse engineering systems derive

high-level mo dels from the source co de. These

derived mo dels are useful b ecause they are, by

theirverynature,accuraterepresentationsofthe

source. Although accurate, the mo dels created

by these reverse engineering systems may dier

fromthemo delssketchedbyengineers;an

exam-pleofthisisrep ortedbyWongetal.[WTMS95].

Wehavedevelop edanapproach,illustratedin

Figure 1, that enables an engineer to pro duce

sucientlyaccuratehigh-levelmo delsina

dier-entway. The engineerdenes ahigh-levelmo del

of interest, extracts a source mo del (such as a

call graphoran inheritance hierarchy)from the

source co de, and denes a declarative mapping

b etween the two mo dels. A software reexion

model is thencomputed to determinewhere the

engineer's high-level mo del do es and do es not

agree with the source mo del. 1

An engineer

in-terprets the reexion mo del and, as necessary,

mo dies the input to iteratively compute

addi-tional reexion mo dels.

1

(2)

\re-Produces

Input

Input

Input

Produces

Defines

Input

Interpreted By

Reflexion

Model

Tools

Extraction

Tool

Defines

High−level Model

Mapping

Source

Code

Source

Model

Reflexion Model

Figure 1: TheReexion Mo delApproach

In essence, a reexion mo del summarizes a

sourcemo delofasoftwaresystemfromthe

view-p ointofaparticularhigh-levelmo del. Thisform

ofsummarizationisuseful toengineersp

erform-ing a variety of software engineering tasks. In

one case, an engineer atMicrosoft Corp oration,

priortop erformingareengineeringtask,applied

reexion mo dels to help understand the

struc-ture of the Excel spreadsheet pro duct. In

an-other case, computing a sequence of reexion

mo delshighlightedseveralplaceswherecallsb

e-tweenmo dulesviolatedthelayersofthesoftware

architecture that an engineer had p erceived to

exist.

Wehavedevelop edseveraltechniquesfor

sim-plifyingtheengineer'staskof deninghigh-level

mo dels and mappings. We have also develop ed

to olsthatcomputereexionmo delsforlarge

sys-tems in a minute or two. As describ ed in

Sec-tion 2, these techniques and to olshave made it

p ossible for an engineer, in a few hours, tob

et-ter understand a system of several hundreds of

thousands oflinesof co de.

Toclarifythemeaningand computationof

re-exion mo dels, we present a formal

characteri-zation of a reexion mo del system in Section3.

The exibility of our approach is demonstrated

considers the theoretical and practical p

erfor-mance asp ect of theapproach and our to ols. In

Section6,wediscusskeyasp ectsoftheapproach

thatprovidesoftwareengineerswiththe needed

exibility. Section7 considersrelated work.

2 An Example

To convey our basicapproach,we describ e how

adevelop erwithexp ertiseinUnixvirtual

mem-ory (VM) systems used reexion mo dels to

fa-miliarizehimselfwithanunfamiliar

implementa-tion,NetBSD.The systemiscomp osed of ab out

250,000linesofC[KR78]sourceco despreadover

approximately 1900sourceles.

The develop er rst sp ecied a mo del he b

e-lieved,basedonhisexp erience,tob e

characteris-ticofUnixvirtualmemorysystems. Thismo del

consists of a mo dularization of a virtual

mem-ory implementation and thecalls b etween those

mo dules (Figure2a).

A calls relation b etween NetBSD functions

was then computed using the cross-reference

database to ol (xrefdb) of Field [Rei95], and a

smallawk[AKW79]script waswrittento

trans-late the output of Field into our input format.

The extracted calls relation consisted of over

15,000 tuplesover3,000sourceentities. 2

Next,thedevelop erdenedthefollowing

map-ping: [ file=.*pager.* mapTo=Pager ] [ file=vm_map.* mapTo=VirtAddressMaint ] [ file=vm_fault\.c mapTo=KernelFaultHdler ] [ dir=[un]fs mapTo=FileSystem ] [ dir=sparc/mem.* mapTo=Memory ] [ file=pmap.* mapTo=HardwareTrans ] [ file=vm_pageout\.c mapTo=VMPolicy ]

Each linein thisdeclarative map asso ciates

en-titiesinthesourcemo del(ontheleft)with

enti-ties in the high-level mo del (on the right). The

seeming diculty of dening a mapping given

the thousands of entities in theNetBSD source

2

Eachtuplecontainedthenameofthecallingfunction,

(3)

neer onlynamed entitiesinsubsystems of

inter-est. For example, the mapping ab ove do es not

consider entities from the I/O subsystem.

Sec-ond, the physical (e.g., directory and le) and

logical (e.g., functions and classes) structure of

thesourceare usedtonamemanysourcemo del

entities in a single line of the mapping. In this

case,onlyphysicalstructureisusedsinceC

pro-vides littlelogicalstructure. Finally,regular

ex-pressions are used to obviate the need to

enu-merate a large set of structures. For instance,

therstlineofthemappingstatesthatall

func-tionsfoundwithinleswhosenameincludes the

stringpagershould b easso ciatedwiththe

high-levelmo del entityPager.

Giventhesethreeinputs,areexionmo delwas

computed to compare the source and high-level

mo dels(Figure2b). Thesolidlinesshowthe

con-vergences, where the source mo del agrees with

the high-level mo del. For instance, as the

de-velop er exp ected, there were calls found in the

sourceb etweenfunctionsinmo dules

implement-ing VMPolicy and functions in mo dules

imple-menting Pager. The dashed arrows show the

divergences, where the source mo del includes

arcs not predicted by the high-level mo del. For

instance, the dashed line from FileSystem to

Pager indicates that functions within mo dules

mapp ed to FileSystem make calls to functions

within mo dules mapp ed to Pager. The dotted

linesshowtheabsences,wherethesourcemo del

do esnotincludearcs predictedbythehigh-level

mo del. For instance, no calls were found b

e-tween mo dules mapp ed to Pager and mo dules

mapp edtoFileSystem. Thenumb er asso ciated

witheacharcinthegureisthenumb erofsource

mo delrelationvaluesmapp edtotheconvergence

or divergence; absence arcs are annotated with

thevaluezero.

Computation and display of the reexion

mo del shown in Figure 2 takes twenty seconds

on a DEC 3000/300 with our to ols, which

con-sistofseveralsmallC++[Str86]programsanda

userinterfaceimplementedinTCL/TK[Ous94].

Computed reexion mo dels are displayed using

AT&T's graphviz package. The to ols allow the

tuples. ThescreensnapshotinFigure3includes

a window \Arc Values" that shows the result

of selecting thedivergenceb etween FileSystem

and Pager. Valuesinthiswindowshowthe

call-ingandcalledfunctions,includingtheirdirectory

and leinformation.

In a one hoursession, the VM develop er was

abletoiterativelysp ecify,computeandinterpret

severalreexion mo dels. (Thesourcemo del was

extracted b eforehand, and it was not changed

during this session.) Informally, the develop er

found the representation of the source co de as

a reexion mo del useful in providing a global

overview of the structure of the NetBSD

im-plementation. For example, from studying sp

e-cic divergences, the develop er concluded that

the implementationof FileSystemincluded

op-timizationsthatrelyoninformationfromPager.

Thisinformationisuseful forplanningmo

dica-tions toeithermo dule.

3 Forma l Characterization

To make precise the meaning and computation

of a reexion mo del, we present a formal sp

eci-cation of a reexion mo del system using the Z

sp ecication language[Spi92].

3.1 Reexion Model

A static schema in Z describ es a state the

sys-tem can o ccupy as well as invariantsthat must

b e maintainedacross statetransitions. Any

sys-tem implementing the reexion mo del approach

must maintain the state comp onents and

in-variantsdescrib edintheReexionModel schema

presented b elow.

The ReexionModel schema uses two basic

typ es, HLMENTITY,whichrepresentsthetyp e

of a high-level mo del entity and SMENTITY,

which represents the typ e of a source mo del

entity. Four typ e synonyms are also dened:

HLMRelation and SMRelation, which dene

re-lationsover high-levelmo del entitiesand source

mo del entities resp ectively, and HLMTuple and

(4)

Memory

Pager

FileSystem

User

VMPolicy

HardwareTrans

KernelFaultHdler

VirtAddressMaint

Memory

HardwareTrans

0

0

51

VMPolicy

6

KernelFaultHdler

0

6

3

Pager

3

8

2

VirtAddressMaint

2

2

User

0

0

0

10

4

0

61

7

6

3

21

FileSystem

0

16

9

11

575

Module

Calls

Module

Absence

Convergence

Divergence

(a) High−level Model

(b) Reflexion Model

(5)

[HLMENTITY;SMENTITY]

HLMRelation ==HLMENTITY $HLMENTITY

SMRelation==SMENTITY $SMENTITY

HLMTuple==HLMENTITY2HLMENTITY

SMTuple==SMENTITY 2SMENTITY

ReexionModel

convergences :HLMRelation

divergences:HLMRelation

absences :HLMRelation

mappedSourceModel :HLMTuple $SMTuple

convergences\divergences=?

divergences\absences=?

convergences\absences=?

dommappedSourceModel =

convergences[divergences

The variables (ab ove the dividingline) in the

schema dene the information that must b e

maintained by a reexion mo del system. The

relation describ ed by the convergences variable

denes where a high-level mo del agrees with

by the divergences variable describ es where a

source mo del diers from a high-level mo del;

and therelation describ edbytheabsences v

ari-able denes where a high-level mo del diers

from a source mo del. The fourth variable,

mappedSourceModel, denes a relation that

de-scrib es which source mo del values contribute to

a convergent or divergent arc in the reexion

mo del. The information inmappedSourceModel

is used to supp ort op erations that aid an

engi-neer in interpreting a reexion mo del. For

ex-ample,theinformationinmappedSourceModel is

necessaryto supp ort a queryof the form shown

inFigure 3.

Inaddition todeclaringthestatecomp onents

of a reexion mo del, the staticschema includes

the denition of constraints (b elow the

divid-ing line) that must b e satised by all reexion

mo del systems. The rst three constraintsstate

that thevalues of the convergences , divergences

and absences relations are disjoint. The fourth

constraint states that the investigation of the

(6)

have nocontributing sourcemo del values.

3.2 Computing a Reexion Model

The dynamic schema, ComputeReexionModel,

presentedb elow,describ es thecomputationof a

reexion mo del from three inputs: a high-level

mo del, a source mo del, and a mapping from

the source to the high-level mo del. The

high-levelmo del(hlm?)isdescrib edasarelationover

high-level mo del entities,and the source mo del

(sm?)isdescrib edasarelationoversourcemo del

entities. The mapping (map?) is an ordered

list of map entries. Each map entry|dened

by the typ e MapEntry|names zero or more

source mo del entities and asso ciates with these

entities, one or more high-level mo del entities. 3

SMENTITYDESC represents the typ e of a

de-scription namingzero or moresourcemo del

en-tities(e.g., aregular expression over logicaland

physical softwarestructureinourto ols).

[SMENTITYDESC] MapEntry == SMENTITYDESC2(PHLMENTITY) ComputeReexionMod el 1ReexionModel hlm?:HLMRelation sm?:SMRelation

map?:seqMapEntry

mapFunc :(seqMapEntry2SMENTITY)!

(PHLMENTITY) mappedSourceModel 0 = S ft : SMTuple j t 2 sm? (mapFunc(map?;rstt)2

mapFunc(map?;secondt))2

ftgg convergences 0 = hlm? \(dommappedSourceModel 0 ) divergences 0 = (dommappedSourceModel 0 ) nhlm? absences 0 =hlm? n (dommappedSourceModel 0 ) 3

Anadditionalinvariantwhichconstrainsamapentry

to name onlyhigh-level mo del entities sp eciedin hlm?

function (mapFunc) that matches entities from

thesourcemo deltothesp eciedmap,pro ducing

aset ofasso ciatedhigh-levelmo del entities.

ThevalueofmappedSourceModel iscomputed

by pushing the elements of each source mo del

tuple through the map, resulting in two sets of

high-level mo del entities. The cross-pro duct of

these sets is taken and each element in the

re-sultant setis asso ciated(through another

cross-pro duct) with the original source mo del tuple.

Once the mappedSourceModel is computed, the

values of the convergences , divergences, and

absences relationsareeasilydeterminedthrough

set intersection and setdierence op erations.

3.3 A Family of Reexion Models

This Z sp ecication denes a family of

reex-ion mo del systems. Dierent kinds of reexion

mo del systems result dep ending on the choices

madeforrepresentingthesourcemo delentity

de-scriptionsinamap(SMENTITYDESC)andfor

dening the mapping function(mapFunc). Our

reexion mo del system describ es source mo del

entities using a combination of structural

infor-mation and regular expressions. Our to ols

sup-p ortthe useof twodierent mappingfunctions.

The most common mapping function used

pro-duces theset of high-levelmo del entities asso

ci-atedwiththerstmatchofagivensourcemo del

entitytoanentryinthemap. Alternatively,our

to olssupp orttheuseofamappingfunctionthat

returnsthesetofhigh-levelmo delentities

result-ing from the union of all matches found in the

map. Ourto olscanthusb econguredtoprovide

implementationsoftwodierentreexionmo del

systems.

4 Experience

The reexion mo del approach has b een applied

to aid engineers inp erforming a variety of

soft-ware engineering tasks on a numb er of dierent

(7)

implemen-crosoft Corp oration applied reexion mo dels to

assess the structure of the Excel spreadsheet

pro duct prior to a reengineering activity. The

Excel pro duct consists of over one millionlines

ofCsourceco de. Theengineercomputed

reex-ion mo delsseveraltimesa dayover afour week

p erio dtoinvestigatethecorresp ondenceb etween

ap ositedhigh-levelmo delandamo delofalmost

120,000callsandglobaldatareferencesextracted

from the source. A detailed mapping le

con-sistingof almost1000 entrieswaspro duced and

is b eing used to guide the extraction of comp

o-nents from the source. The engineer found the

approach valuable for understanding the

struc-ture of Exceland planning thereengineering

ef-fort.

DesignConformance Weusedasequenceof

reexion mo dels to compare the layered

archi-tecturaldesignofGriswold'sprogram

restructur-ing to ol [GN95] with a source mo del consisting

of calls b etween mo dules. The reexion mo del

highlighted, as divergences, a few cases where

mo dulesinthesourceco dedidnotadheretothe

layeringprinciples. Weareunawareofanyother

approachthat would allowsuch violationstob e

found sodirectly.

An industrial partner applied reexion mo

d-els tocheckifa6,000lineC++ implementation

of a subsystem matched design do cumentation

(intheformofaBo o chobjectdiagram[Bo o91])

preparedpriortoimplementation. Thiscasewas

uniqueinthatthereexion mo delwasfully

con-vergent withthesourcemo del.

System Understanding We used reexion

mo delstotrytodetermine whya compilerused

in undergraduate education at the University

of Washington was dicult for the students to

change. We computed a reexion mo del

com-paring an extracted Ada le imp orts relation

withaconventionalmo delofacompiler[PW92].

Thereexionmo delcontainedmeaningful

diver-gences b etween almostallpairsofhigh-level

en-tities. This high degree of coupling explains, in

part, why the students had diculty changing

version of the compiler, written in C++. The

reexion mo dels for this compiler were far less

cluttered than the Ada version. However, some

unexp ected interactionswereidentiedusingthe

reexion mo del;these mayprovidethe basisfor

either minor restructuring of thecompiler orat

least additionalwarningstothestudents.

5 Performance

Engineers iteratively sp ecify, compute, and

in-terpret reexion mo dels. The rate at which an

engineercan interpretanditeratereexionmo

d-els is dep endent, in part, up onthe the sp eed of

the computation.

The formal characterization presentedin

Sec-tion 3 provides a basis for considering the

theoretical complexity of computing a

reex-ion mo del. From the dynamic Z schema,

ComputeReexionModel, we can see that the

time complexityof computinga reexionmo del

is dep endent up on the cost of computing the

mappedSourceModel relation and the cost of

comparing that computed relation to the

high-levelmo del. Anupp er b ound,then,on the

com-plexityofcomputingthemappedSourceModel

re-lation isgivenby:

O(#sm 2 #map 2t

comparison

) 2O((#hlm) 2

)

where#sm isthecardinalityofthesourcemo del

relation,#map isnumb er of entries inthemap,

t

comparison

is the cost of comparing a source

mo del entity to a source mo del entity

descrip-tion ina mapentry,and #hlmisthecardinality

of the high-levelmo del relation. Since the

num-b er of entities in the high-level mo del is

gener-ally smalland constant,theO((#hlm) 2

) can,in

practice, b eignored,yielding:

O(#sm 2 #map 2 t

comparison )

Our initial implementation of to ols for

com-puting reexion mo dels p erformed the

(8)

straightfor-ciently fast for mo derately large systems (with

sourcemo delsconsistingof tens of thousandsof

tuples) and small maps (tens of lines), but was

notfast enough tosupp ort theiterative

compu-tation of reexion mo dels for larger systems or

larger maps. For example, an early version of

our to ols required 40 minutes on a Pentium to

computea reexion mo del forExcel.

By trading space for time inthe

implementa-tion of our to ols, we have b een able to supp ort

the computation of reexion mo dels for large

software systemsand large maps ina minute or

two (see Table 1). Sp ecically, our to ols hash

thematchofhigh-levelmo delentitiesforagiven

sourcemo delentitythersttimeasourcemo del

entityisseen. Theadditionalspacerequirements

dep endup onthenamingschemeusedforsource

mo delentitiesand thenumb erofuniqueentities

in the source mo del. In the case of Excel and

the naming scheme used byour to ols, there are

18,118uniquesourcemo del entitieseach

requir-ingon theorder of 100bytes (lessthan2 Mb in

total). Our to olsprovidea varietyof options to

lettheengineerdeterminetheappropriatespace

timetradeo.

6 Discussion

Reexionmo delsp ermitanengineertoeasily

ex-plore structural asp ects of a large software

sys-tem. Thegoaloftheapproachistoprovide

engi-neerswiththeexibilitytopro duce, atlow-cost,

high-level mo dels that are \go o d enough" for

p erformingaparticularsoftwareengineeringtask

(restructuring, reengineering, or p orting, etc.).

Three asp ects of the approach critical to

meet-ingthisgoalaretheuseofsyntacticmo dels, the

useof expressive declarativemaps, and supp ort

forqueryinga reexion mo del.

Syntactic Mo dels Asdescrib edintheformal

characterization,reexion mo delsare computed

withoutanyknowledgeoftheintendedsemantics

ofthehigh-levelorthesourcemo del. Ab enetof

thissyntacticapproach isthe abilityof an

engi-ware system (calls, data dep enden ces, or event

interactions,etc.). It also means, however, that

it is the engineer's resp onsibilityto ensure that

it makes sense to compare a selected high-level

mo del with an extracted source mo del. For

ex-ample, comparing the sp ecied calls diagram of

Figure 2awithanextractedcallsrelationmakes

sense, but comparingthe samehigh-levelmo del

with a source mo del representing the #include

structure of NetBSD would probably b e

mean-ingless.

Inpractice,engineers haveexploitedthis

ex-ibility by changing the meaning of their mo

d-els over time. For example, b oth the VM

de-velop er and the Microsoft engineer rst used

a calls source mo del and later augmented the

source mo dels with static dep endence s of

func-tion denitionsonglobaldata. Byadding static

dep ende nces, b oth develop ers implicitly shifted

their high-levelmo del from a callsdiagramto a

communicates-withdiagram. In b oth cases, the

changes weredriven by theneed tounderstand,

for thetaskb eingp erformed, additionalasp ects

of thesystem structure.

To aid the engineer in interpreting reexion

mo delscomputed withmo delscontaining

dier-ent kinds of information,we are addingsupp ort

fortypingrelationsinb othsourceandhigh-level

mo dels.

Maps The declarative maps used in

comput-ingareexionmo delenableanengineertofo cus

on information of interest in the source in two

ways. First, an engineer may sp ecify a partial

mapthat containsentriesforonlythose partsof

thesystemrelevant tothetaskathand. Second,

an engineer may iterativelyrene a map to the

appropriate levelofdetail necessaryforthetask

b eing p erformed.

Generally, an initial reexion mo del is

com-puted with a rough and partial map. Then,

basedon aninvestigationof thereexion mo del,

an engineer renes the map in the areas of

in-terest untilthenecessary informationab out the

system is obtained. Sometimes, as was thecase

(9)

Linesof Mo del Mo del 100MHz

Co de (Tuples) (Lines) (Entities) (min:sec) (min:sec)

UW compiler C++ 3,700 607 33 5 :00.5 :01 UW compiler Ada 4,200 72 9 5 :00.2 :01 Restructur ing To ol CLOS 47,000 5,855 215 9 :04.3 :06 NetBSD (VM) C 250,000 15,657 7 8 :04.0 :19 Excel C 1,200,000 119,637 971 15 2:13.0 4:05

Table1: Performanceof ReexionMo del To ols

fairly rough map was sucient. In contrast, in

thecase ofExcel,a detailedmapwasdesired to

planreengineering activities.

The reexion mo del approach enables an

en-gineer to balance the cost of rening the map

with thelevelof detail necessary forp erforming

a particularsoftware engineering task. We plan

totracktheuseofsomedeclarativemapsacross

theevolutionofseveralsystemstodeterminethe

degree ofsensitivityofourmappinglanguageto

changesmadeinthesource. Thiswillaidan

en-gineer in judging the amortization costs of

cre-atingdetailedmaps forlargesystems.

QueryingReexionMo dels Reexionmo

d-els bridge the gap b etween an engineer's

high-level mo del and a mo del of the source. The

convergences,divergences,andabsences

summa-rize selected interactions in the source, while

the mappedSourceModel captures the

connec-tions b etween the high-level and source mo del

arcs. Based on the summary information, the

engineer intersp ersestwo kindsof queries to

in-terpret a reexion mo del for a sp ecic software

engineeringtask.

In the rstkind of query, an engineer

investi-gates thesource mo del values contributing to a

convergence or divergence. In the second kind,

an engineer p erforms queries to determine the

source mo del entities and values that were not

included in thereexion mo del. This query

en-ables an engineer to assess whether the map is

absencesinthecomputedreexionmo delarethe

result of incompletenessinthemap.

Based on theresults of the interpretation,an

engineer mayeitherdecide torene one ormore

of the inputs, computing a subsequent reexion

mo del,orelsemaydecidethatsucient

informa-tion hasb eenobtainedtopro ceedwiththe

over-alltask. Tob ettersupp ortanengineerinthe

in-terpretationpro cess,wearecurrentlydeveloping

techniques to improve thequerying and

investi-gationof a seriesofcomputed reexion mo dels.

7 Related Work

Reverse Engineering The reverse

engineer-ing approaches closestto ours use clustering

in-formation,whichisgenerally culledfroma

com-bination of human input and numerical

com-putation, to create abstract representations for

theengineer. Examples ofthisapproachinclude

Rigi [MK89] and Schwanke's statistically-based

architectural recoverytechnique [Sch91].

Reexion mo dels dier in a numb er of ways.

First,in ourapproach theengineer sp ecies the

high-level entities explicitly, whereas the

archi-tectural recovery systems instead infer them.

Second, we fo cus on comparing high-level and

source mo dels, ratherthan on discovering

high-level mo dels. Third, our mappings are

declara-tive,asso ciatingsourceandhigh-levelentities,in

(10)

engineer to build explicit hierarchicalmo dels of

the software structure of a system. We instead

supp ort the investigation of substructure (i.e.,

a subsystem) by computing dierent reexion

mo delsatvariouslevels of abstraction.

Mo del Comparison Ossher has considered

the comparison of relational mo dels of software

structure for a xed typ e of high-level mo del, a

GRID[Oss84]. TheintentofaGRIDdescription

of a system is to \sp ecify, represent, do cument

and enforce the structure of large, layered

sys-tems" [Oss87, pg. 219]. The GRID mechanism

p ermits an engineertocho osea concise

descrip-tion ofthesoftwarestructureand thento

anno-tate how the actual structure of the system

de-viatesfromthedescription. Ourapproachisnot

intendedtosp ecifyorenforceadesiredstructure

ofthesystem,butrathertocomparetwomo dels

of the structure using an explicit mapping b

e-tween the twomo delsprovidedby theengineer.

Jackson's Asp ect system [Jac93] compares

partial program sp ecications (high-level mo

d-els) to data ow mo dels extracted from the

source to detect bugs in the source co de that

cannot b e detected using static typ e checking.

Asp ectuses dep endenc es b etween datastores as

a mo del of the b ehavior of a system and

as-sumesthatthismo deliscorrect,fo cusingonhow

thesourcemo del diersfromthep ositedb

ehav-ior. Incontrast,wefo cusonhigh-levelstructural

mo dels, and we are interested in b oth how the

source mo del diers from the high-level mo del

and also how the high-level mo del diers from

thesource.

Jackson has also develop ed a semantic

dier-ence approach for comparing the dierences in

inputand outputb ehaviorb etween twoversions

of a pro cedure [JL94]. This approach derives

an approximate mo del of the semantic eect of

a pro cedure consistingof a binary relation that

summarizesthedep ende nceofvariablesafter

ex-ecutingthepro cedureup onthevalueofvariables

at the entry to the pro cedure. The semantic

dierencing to ol compares the binary relations

resulting from dierent versionsof a pro cedure.

declarativemapping.

8 Summary

A reexion mo del summarizes information

ex-tracted fromsourceco deintoahigh-levelmo del

that is suciently accurate to supp ort an

engi-neer in p erforming a software engineering task.

The engineer denes three inputs to a reexion

mo delcomputation: ahigh-levelmo del,asource

mo del,and amap. Thereexion mo delpresents

the summary information in the context of the

high-level mo del dened by the engineer. The

engineerinterpretsanditerativelycomputes

suc-cessive reexion mo delsuntil satised. The

fea-sibilityandexibilityof theapproachhaveb een

demonstrated throughitsapplicationin a

num-b er ofdierentsettingson systemsrangingfrom

severalthousandtooveronemillionlinesofco de.

Acknowledgments

Rob ert Allen, Kingsum Chow, David Garlan,

BillGriswold,MichaelJackson, KurtPartridge,

Bob Schwanke, and Michael VanHilst each

pro-vided helpful comments on earlier drafts of

the pap er. Conversations with Daniel Jackson

help ed clarifya numb er of asp ects of our work.

AnanonymousMicrosoftengineerapplied

reex-ion mo dels to Excel. Dylan McNamee was our

VMdevelop er. PokWongappliedreexionmo

d-elsaspartofadesignconformancetask.Stephen

NorthofAT&Tprovidedthegraphvizgraph

dis-play and editing package. We also thank the

anonymous referees for their constructive

com-ments.

References

[AKW79] A.V. Aho, B.W. Kernighan, and P.J.

Weinb erger. Awk { A Pattern

Scan-ningandPro cessing Language. Software

{Practiceand Experience,9(4):267{280,

1979.

[Bo o91] G. Bo o ch. Object-oriented Design

(11)

turalTradeosforaMeaning-Preserving

Program Restructuring To ol. IEEE

Transactions on Software Engineering,

21(4):275{287,April1995.

[Jac93] D.Jackson. Abstract Analysis with

As-p ect. In Proceedings of the 1993

Inter-nationalSymposiumonSoftwareTesting

and Analysis,pages19{27,1993.

[JL94] D. Jackson and D.A. Ladd. Semantic

Di: A To ol For Summarizing the

Ef-fects ofMo dications. In Proceedings of

theInternationalConferenceonSoftware

Maintenance,Septemb er 1994.

[KR78] B.KernighanandD.Ritchie.TheCPr

o-grammingLanguage.PrenticeHall,1978.

[MK89] H.A.M uller and K. Klashinsky. A

Sys-tem for Programming-in-the-large. In

Proceedings of the 10th International

Conference on Software Engineering,

pages 80{86. IEEE Computer So ciety

Press,April1989.

[Oss84] H.L.Ossher.ANewProgramStructuring

Mechanism Based on Layered Graphs.

PhDthesis,StanfordUniversity,

Decem-b er1984.

[Oss87] H. Ossher. A Mechanism for Sp

eci-fying the Structure of Large, Layered

Systems. In Bruce Shriver and

Pe-terWegner,editors,Research Directions

in Object-Oriented Programming, pages

219{252.MITPress,1987.

[Ous94] J.K.Ousterhout.TCL&theTKToolkit.

Addison-Wesley,1994.

[PW92] D.E. Perry and A. Wolf.

Founda-tionsfortheStudyofSoftware

Architec-ture. ACM SoftwareEngineering Notes,

17(4):40{52,Octob er 1992.

[Rei95] S.P. Reiss. The FieldProgramming

En-vironment: A Friendly Integrated

Envi-ronment for Learning and Development.

KluwerAcademic Publishers,1995.

[Sch91] R.Schwanke.An IntelligentTo olfor

Re-engineeringSoftwareMo dularity.In

Pro-ceedings of the 13th International

Con-ference on Software Engineering, pages

83{92,May1991.

[Smi84] B.C. Smith. Reection and Semantics

guages Conference, pages 23{35. ACM,

Decemb er 1984.

[Spi92] J.M. Spivey. The Z Notation. Prentice

Hall,second editionedition,1992.

[Str86] B. Stroustrup. C++Programming

Lan-guage. Addison-Wesley,1986.

[WTMS95] K. Wong, S.R. Tilley, H.A.M uller, and

M.D. Storey. Structural Redo

cumenta-tion: A Case Study. IEEE Software,

References

Related documents

Understanding why stock ownership can bias directors’ objectivity, and examining how board discussion transparency can yield differential effects for stock- owning and

Short-term effects of therapeutic claw trimming in acutely lame cows (n = 21) with nonadvanced claw horn lesions on the endocrine, metabolic, and behav- ioral stress responses

Looking ahead, substantial research investment will be needed to ensure that we do not forgo the opportunity to learn how efficiency changes, as chronic care is becoming

There are several factors within the single-sex environment at Assisi that are also important in the outcome of leadership development: a number of positive role models, a focus

électrique, Département de génie électrique et génie informatique, Université du Québec à Trois-Rivières, Mai

In this present study, the optimized molecular geometries and vibrational spectra of 2,5-DBP have been calculated by using ab initio Hartree-Fock (HF) and density functional

Objective: To determine the incidence and associated factors of multidrug-resistant hospital- associated infections (MDR-HAI) in a pediatric intensive care unit (PICU) of a

This paper shows that ignoring price heterogeneity in a differentiated good market yields price-cost ratio estimates severely biased towards one regardless of competitiveness