Filipp oRiccaandPaoloTonella
ITC-irst
Centrop er laRicercaScienticaeTecnologica
38050Povo(Trento),Italy
fricca,[email protected]
tel.+39.0461.314592,fax+39.0461.314591
Abstract. Webapplicationsareb ecomingincreasingl ycomplexand
im-p ortant forcompanies.Theirdesign, development, analysis and testing
needthereforetob eapproachedbymeansofsupp ortto ols andmetho
d-ologies.Inthispap erweconsidertheproblemsrelatedtobuildingto ols
forthe analysis and testing ofWebapplicati ons and we tryto provide
someindicationsonp ossible solutions,basedup onourexp erience inthe
development oftheto ols ReWebandTestWeb.
Thedenition ofaprop er reference mo delwill b ediscussed, as well as
theimpact of dynamic pages duringWeb sitedownloading and
subse-quentmo delconstruction.Visualizati ontechniquesaddressingthelarge
amountofextracteddatawillb epresented,whileinfeasibili ty problems
willb econsidered withreferencetothetesting phase.
1 Introduction
Inthelastyears,Webapplicationshaveb ecomeimp ortantassetsforseveral
com-panies,b eingaconvenientandinexp ensive waytoprovidepro ductinformation,
e-commerceandserviceson-line.SinceasoftwarebuginaWebapplicationcould
interruptanentirebusinessandcostmillionsofdollars,thereisastrongdemand
formetho dologies,to olsandmo delsthat canimprove thewebsite qualityand
reliability [7,8].For example, to ols can supp ort develop ers understanding the
abstractstructure of aWebapplicationbymeansofviewsandanalyses,
ensur-ingthattherequirementsp ecicationsaresatisedbytheapplication,andthey
canhelpinthetestingphase.Developingato olthatextracts amo delofaWeb
application,implementssomestaticanalysesandsupp ortsthedevelop ersinthe
testing phase is noteasy.Main problemsare relatedto:mo delingthe abstract
structure ofWeb applications,adaptingknown analysisand testing techniques
tothecharacteristicsofWebbased systems,andvisualizinglargegraphs[6,10].
Onlyfewworks haveinsofarconsideredtheproblemsrelatedto Websitestatic
analysis,maintenance,testingand to buildingtheasso ciatedto ols. One ofthe
rst systematic studies on Webmaintenance is [12],where theauthors
recog-nize the similarity b etween software systems and web based systems and the
imp ortance of the maintenance phase. They have built a to ol called SiteSeer
that downloadsweb sites and computes some metricson them.The pap er [9]
describ es SPHINX,aJavato olkitandinteractive developmentenvironmentfor
Webspiders.SPHINX consistsoftwoparts:theSpiderworkb ench,a
customiz-ablespider that supp ortsagraphical user interface and visualizesthewebsite
recoveredas agraph,andtheWebSPHINXclasslibrary,thatprovidessupp ort
forwritingWebspidersinJava.TheCAPBAK/Webto ol,explainedin[8],is a
webtesting to ol that supp orts functionaltesting and regression testing. In [7]
anapproachto dataowtestingofWebapplicationsispresented.
Inthispap erweconsidertheproblemsrelatedtobuildingto olsforthe
anal-ysisandtesting ofWebapplicationsandwetry to providesomeindicationson
p ossible solutions,based up on our exp erience in the developmentof the to ols
ReWeband TestWeb.ReWebdownloadsand analyzes the pages of aWeb
applicationwithatwofoldpurp ose:buildingamo deloftheapplicationand
sup-plyingsomeviewsandanalysestothedevelop er.TestWeb,astructuraltesting
to ol, generates and executes a set of test cases for a Web application whose
mo delwascomputedbyReWeb.
Theremainderofthispap erisorganizedasfollows:thenextsectiondescrib es
agenericWebapplicationinfrastructure,Section3intro ducesthegeneral
archi-tecture of our to ols andpresents the adopted analysis mo delfor Web
applica-tions,Sections 4and 5explain problemsencountered and solutionsadopted in
thedevelopmentoftheto olsReWebandTestWeb.Finally,Section6concludes
thepap er.
2 Web Applications
AtypicalgenericWebApplicationinfrastructureisshowninFigure1(asimilar
schemaisprop osed in[13]).
Web
Browser
Web
Server
Back-end
Services
HTTP Response
HTTP Request
Request
HTML Output
Content Response
Content Request
Request
Response
Presentation
Content
Application
Data and
Layer
Layer
Layer
Request
Response
Service Layer
Databases
Repository
(dynamic contents)
(static information)
Application
Server
The browser sends the requests via HTTP to the server for an interactive
view of Web pages. Web pages can b e static or dynamic. While the content
of astatic page is xed and stored in a rep ository, the content of a dynamic
pageiscomputedat run-timebytheapplicationserver andmaydep end onthe
informationprovided by the user through input elds (a similardistinction is
prop osedin[4]and[5]).Theprogramsthatgeneratedynamicpagesatrun-time,
as forexampleCGI scriptsand servlets,run onthe applicationserver andcan
useinformationstoredindatabasesandotherresources.TheWebserverandthe
applicationservercanb elo catedonthesamemachineorondierentmachines.
Similarlyto [5],we classifyWebapplications 1
according toataxonomy,
or-deredbygrowingcomplexity,whichischaracterizedbydynamism,page
decom-p ositionanddataow.
{ Level0:staticpageswithoutframes.
{ Level1:staticpageswithframes.
{ Level2:dynamic pageswithoutdatatransfer fromclient.
{ Level3:dynamic pageswithdatatransferfromclient.
The diÆculties and problems in the construction of a to ol, that supp orts
develop ersinthephases ofanalysisandtestingofWebapplications,growwith
increasinglevelsinthetaxonomy.Applicationsatlevels2and3typicallyexploit
informationstoredinsideadatabasetobuildthecontentofthedynamicpages.
Allfourlevelsareinthescop eoftheprop osed techniques.
3 Tool Architecture
Output
pages
Coverage
Pass/fail
values
Expected
UML
model
TestWeb
ReWeb
Internet
URL
Input forms
Views
Analyses
Uses,
Dynamic pages,
Cond. edges
Input values
Test criterion
Fig.2. RolesofRewebandTestWeb.
Thetwoto olsReWebandTestWebhave b eendevelop edtosupp ort
anal-ysis and testing of Web applications. Their relative roles are schematized in
1
Although some authors distinguish b etween Websites and applicatio ns, using the
Figure2.ReWebdownloadsandanalyzes thepagesofaWebapplicationwith
the purp oseof buildingamo delof it andpro ducing someanalyses andviews.
TestWebgeneratesandexecutesasetoftestcasesforaWebapplicationwhose
mo delwascomputedbyReWeb.Thewholepro cessissemi-automatic,andthe
interventionsoftheuserareindicatedwithindiamondsinFigure2.Explanations
onthemanualinterventionswillb egiveninthefollowingsections.
Both to ols p erformtheir op erations onan abstraction of theWeb
applica-tions, indicated inFigure 2as UML mo del. UML,the Unied Mo deling
Lan-guage[3],wasexploitedtoexpresssuchamo del.Letusconsiderthekey
require-mentsonthemo del.Weareinterestedinamo delthatcanb edirectlyabstracted
fromtheimplementation.Someimp ortantcharacteristicsthatitshouldhavecan
b esummarizedasfollows:
{ Thefo cusshouldb eonthenavigational featuresofthesite;
{ Itshouldb ecompletei.e.themostimp ortantentitiesas,forexamplelinks,frames,
formsanddynamicpagesmustb eexplicitl y representedin themo del;
{ Itshouldb ep ossible toprovide (partial)automatic supp ortforitsextraction;
{ It should b e p ossible to apply to it some staticanalyses and testing techniques
derivedfromthoseusedwithtraditionalsoftwaresystems.
{ Itshouldb ep ossible toderive someviews fromitthatrepresentthe Websitein
anintuitive mo de;
WebPage
Form
input: Set<Var>
Frame
StaticPage
DynamicPage
use: Set<Var>
link
LoadPageIntoFrame
f: Frame
ConditionalEdge
{optional}
c: Condition<Var>
{optional}
include
into
initial page
submit
split
split into
0..*
0..*
0..*
0..*
1
{only from DynamicPage}
0..1
Fig.3.Meta mo delofagenericWebapplicationstructure.
Figure3showsourmetamo delused todescrib e theelementsinthemo del
b e displayed to theuser, and the navigation linkstoward other pages. It also
includes organizationandinteractionfacilities(e.g.,framesandforms).
Thetwosub classesof WebPagemo delthestaticanddynamicpages.When
thecontentofadynamicpagedep ends onthevalueof aset ofinputvariables,
theattributeuseofclassDynamicPage containsthem.
Aframeisarectangulararea inthecurrent pagewherenavigationcantake
placeindep endently.Moreover thedierentframesintowhichapageis
decom-p osedcaninteractwitheachother,sincealinkinapageloadedintoaframecan
forcetheloadingofanotherpageintoadierentframe.Thiscanb eachievedby
addingatargettothehyp erlink.Organizationintoframesisrepresentedbythe
asso ciationsplitinto,whosetargetisasetofFrameentities.Framesub division
mayb erecursive(auto-asso ciationsplitintowithinclassFrame),andeachframe
hasaunaryasso ciationwiththeWebpageinitiallyloadedintotheframe(absent
incase ofrecursive sub divisionintoframes).WhenalinkinaWebpageforces
theloadingofanotherpageintoadierentframe,thetargetframeb ecomesthe
datamemb erof the(optional)asso ciationclassLoadPageIntoFrame.
InHTML user inputcanb e gathered byexploitingforms. AWebpagecan
include any numb er of forms(asso ciation include). Each form is characterized
by theinput variables that areprovided by the user through it (data memb er
input).ValuescollectedbyformsaresubmittedtotheWebserverviathesp ecial
linksubmit,whosetargetisalwaysadynamicpage.Sincelinks,framesandforms
are partofthecontent ofaWebpage,and fordynamicpagesthecontentmay
dep end on the input variables, even the organization of a page is, ingeneral,
notxedanddep ends ontheinput. Thisisthereason fortheasso ciationclass
ConditionalEdge, which optionally adds a b o olean condition, function of the
input variables, representing the existence condition of the asso ciation (which
caninturnb ealink,anincludeorasplitinto).Thetarget,page,formorframe,
isreferenced bythesource dynamicpageonlywhentheinputvaluessatisfythe
conditionintheConditionalEdge.
4 ReWeb
TheReWebto olconsistsofthreemo dules:aSpider,anAnalyzerandaViewer.
TheSpiderdownloadsallpagesofatargetwebsite,startingfromagivenURL
and providingthe input required by dynamic pages, and it builds a mo del of
the downloaded site. Each page found withinthe site host is downloadedand
markedwith thedate ofdownloading.TheHTML do cumentsoutside the web
site host arenotconsidered. The user hasto sp ecify the set ofinputs foreach
pagethatcontainsForms.TheAnalyzerusestheUMLmo delofthewebsiteand
the downloaded pages to p erform several analyses, presented in the following,
some of which are exploited during static verication. The Viewer provides a
navigationand query facilities. Web Spider and Analyzer are written in Java,
whiletheViewer isbasedonDotty 2 . 4.1 Spider
SPIDER(target url)
1 UML Model
;2 Error urls
;3 Pages already visited
;4 Urls found
;5 S
ftarget url
g6
while(S
6=
;)
7
choosen url
chooseElement(S)
8
S
S
nfchoosen url
g9
ifnot(choosen url
2Pages already visited)
then10
if(choosen url is OK)
then11
Pages already visited
Pages already visited
[fchoosen url
g12
if(choosen url is a HTML page)
then13
Download(choosen url)
14
Urls found
scanPage(choosen url)
15
S
S
[Urls found
16
AddElementsToModel(UML Model, choosen url, Urls found)
17
endif18
else19
Error urls
Error urls
[fchoosen url
g20
endif21
endif22
endwhileFig.4.Pseudo-co de oftheSpider.
Web pages are not actually written using a single language. They can b e
ratherregardedasmultilanguagedo cuments,whereco defragmentsinlanguages
dierentfromHTMLcanb eloaded(e.g.Applets)orinterpreted(e.g.Javascript).
LibrariesfortheconstructionofSpiderprogramsareavailableforprogramming
languagessuchasPerl,C/C++,orJava.AnexampleistheWebSPHINXclass
library [9].We decided to implement ourWebSpider just exploiting the Java
language and its standard library, starting from scratch, in order to have
to-talcontrolonthemultilingualasp ects ofthedownloadedpages.Wedevelop eda
parserwhichrecognizesb othHTMLandJavascriptco defragments,andextracts
theneeded information(links,forms,frames,etc.)fromthem.
Figure4showsthepseudo-co deofourSpider.Thepro cedureSPIDER takes
agivenURLininput and buildsthe asso ciatedUML mo del. Theb o dy ofthe
commandwhile(containedinlines6-22) isexecuted untilthereareelementsin
theset S.ThefunctionchooseElement (line7) cho oses anelementinthesetS
whiletheconditionchosen urlisOK(line10)istrueifchosen urliswell-formed
andthecorresp onding pageexists intheWebsite.The conditionchosen urlis
2
an HTMLpage (line12)is true ifthecontenttyp eof thedo cumentconnected
tochosen urlisHTML.Inline13thepro cedureDownloadiscalledtostorethe
retrievedpageinthele-system.ThefunctionscanPage(line14)scansthepage
andreturnsthesetofURLsfoundwithinitandcontainedinthesite host.The
pro cedure AddElementsToModel (line 16) adds no des and edges to the mo del
inaccordance withourmetamo del.Iftheset Sisimplementedas astack,the
algorithmvisitstheWebsiteindepth-rstway,whileusingaqueue pro duces a
breadth-rst visit.
Problems encountered inthe construction of the Spider are due to
irregu-larities and ambiguities present in HTML co de, also noted in[12], and to the
currentstateoftheWebtechnology,oeringalargesp ectrum ofalternatives to
implementawebsite.Oursolutionwastoimprovetherobustnessoftheparser,
so that it could accept a sup erset of HTML including the main irregularities
commonlyrecognizedandprop erlyinterpreted byavailablebrowsers.
Dynamicpagesp ose additional problemto theactivityof theSpider.Since
the content of these pages is decided at run time, it may in general dep end
on the input previously provided by the user. In particular, the structure of
a dynamicpage may change when it isencountered in adierent interaction.
Since the mo del of aWeb site encompasses all p ossibilities,the Spider has to
recover all variants of a dynamic page, and has to merge them into a single
representative object.Thiscanb eachievedbysp ecifyingtheinputvaluestob e
providedb eforedownloadingthedynamicpageof interest. Moreover,thesame
dynamicpagehas to b edownloadedseveral times,with dierentinputs, when
thedierentconditionsgenerateadierentpagestructure.Allsequencesofinput
valuestob e providedb eforeeachpagedownloadaresp eciedinalewhich is
readbytheSpider.Alldynamicpagessp eciedinthelearedownloadedafter
providingtheWebserver withthegiveninputs.Finally,allversionsofthesame
page are merged.Although the numb erof inputs to b e provided mayexplo de
combinatorially, out exp erience suggests that in practice few alternatives are
suÆcienttocoverallvariants.
Anadditionalinputthatmayaectthecontentdisplayedinadynamicpage
is the cookie that the browser provides to the Web server. A cookie is a user
identier thatis stored bythe browser inthelo calle systemand is provided
totheWebservertoallowuseridenticationeachtimeanewconnectionwitha
givenserverisestablished.Afterrecognizingtheuser,theWebservercanprovide
acustomizedversionofthedynamicpagesinthesite.Sincetheirstructuremay
dep end ontheco okie,theSpiderneeds theabilitytosendaco okietotheWeb
server, in order to obtain also the pages that are generated when the user is
identied.
4.2 Analyzer
The UML mo del of a Web site can b e interpreted as a graph by asso ciating
objects withno des andasso ciations withedges. Somesimpleanalysesmay
FLOWANALYSIS(graphG=(N,E))
1 foreach(n2N)
2 initializeINnandOUTn
3 endfor 4 change true 5 while(change) 6 change false 7 foreach(n2N) 8 INn L p2pred(n) OUTp 9 OLDOUT n OUT n 10 OUT n GEN n S (IN n ,KILL n ) 11 ifOUT n 6=OLDOUT n then 12 change true 13 endif 14 endfor 15 endwhile
Fig.5.Pseudo-co deoftheowanalysisalgorithm.
page. They are obtained as the dierence b etween the pages available in the
Webserver le systemand thosedownloadedbytheSpider.Ghostpages are
asso ciatedwithp endinglinks,whichreference anonexistingpage.
More advanced analyses[11] can b e derived fromthegeneral frameworkof
owanalysis[1],describ ed bythealgorithminFigure5.Thealgorithm
propa-gatesowinformationinside agraphuntilthex-p ointisreached.Thekindof
owinformationtob epropagateddep endsonthepurp oseoftheanalysisb eing
p erformed.Some examplesaregivenb elow.Moreover,the conuence op erator
exploitedatline8tocollectoutgoinginformationfrompredecessorno desisalso
dep endentontheanalysis,andistypicallyeither theintersectionortheunion.
Afterinitializingtheinputandoutputsetsofeachno de (IN
n
andO UT
n ,lines
1-3)withtheinitialowinformation,propagationisachievedinsideax-p oint
lo op(line5)bysubtractingthedestroyedinformation(KILL
n
set)andadding
thegeneratedinformation(GEN
n
set)totheincominginformation(line10)for
each no deninthegraph.
An example of analysis which sp ecializes the algorithm in Figure 5 is the
computation of the reaching frames, which determines the set of frames in
which each page can app ear.Whena pageis loaded into aframe as its initial
pageorisreachablethroughanedgedecoratedwithaLoadPageIntoFrame
asso-ciationclassinstance,itgeneratesthenameoftheframeasowinformation.By
propagatingsuchinformationalongthesitegraphuntilthex-p ointisreached,
thereaching frames ofeach pageare determined.The outcomeofthereaching
framesanalysis isusefulto understand theassignmentofpagesto frames.The
presence of undesirable reaching frames is thus made clear. Examples are the
p ossibilityto loada page at the toplevel, while it was designed to always b e
loaded into agivenframe,or thep ossibilityto loadapage intoaframe where
itshould notb e.
Flowanalysescanb eemployedinamoretraditionalfashiontodeterminethe
graph.Ifadenitionofavariablereachesano dewherethesamevariableisused
(useattributeofadynamicpage),thereisadatadep endence b etweendening
no de and user no de. Datadep endences areuseful to representthe information
owsintheapplication.Theymayrevealthepresenceofundesirablep ossibilities,
such as using a variable not yet dened or using an incorrect denition of a
variable.Datadep endencesarealsoextremelyimp ortantfordynamicvalidation,
whendataowtestingtechniquesare adopted.
Whenthepagesofasitearetraversed, itisimp ossibletoreach ado cument
withouttraversing aset of other pages,called itsdominators.Sites inwhich
traversing a given page is considered mandatory, e.g., b ecause it contains
im-p ortantinformation,willhaveitinthedominatorsetofevery no de.Dominator
analysis,alsoderivedfromthealgorithminFigure 5,automatesthecheck.
Theevolutionofwebsites [10,12] isanother interesting object of
investiga-tion. Such ananalysis requires theabilitytocompare successive versions ofits
pages and to graphically display the dierences. Given two versions of a web
site,downloadedatdierentdates,theircomparisonaimsatdeterminingwhich
pageswereadded,mo died,deletedorleftunchanged.Itcanb ecombinedwith
thestaticanalysesdescrib ed ab ove,since theirre-computationover timeallows
controllingtheevolutionoftheapplicationquality.
4.3 Viewer
ThegraphviewofaWebapplicationis agraph,whoseno descorresp ondtothe
objects inthe mo deland whose edges corresp ond to the asso ciations b etween
objects. Lab eled edges are used for the links having a LoadPageIntoFrame or
ConditionalEdge relationsp ecier. Inthegraphview,to intuitivelysuggest
de-comp osition into frames, we adopt the convention of joining horizontally the
no des of typ e frame contained in the same page, and collapsing the edges of
typ e \split into" into asingle edge. An example of decomp osition into frames
is shown inFigure6. Page madmaxpub/index.html(the mainpage)is divided
intotwoframeswithidentiersaandb,andframeaisusedasamenutoforce
theloadingofpagesintotheotherframe.
Thegraphviewofawebapplicationcanb eenrichedwithinformationab out
itshistory[10],bycoloringtheno desandasso ciatingdierentcolorstodierent
timep oints(see Figure6).Inparticular,ascaleofcolorsrangingfromtheblue,
goingthroughthegreenandreachingtheredcanb eemployedtorepresentno des
added/mo diedinthefarpast,inthemediumpastormorerecently.
TheViewer is based on Dotty,and uses the algorithmexplained in [6] for
drawingdirectedgraphs.Theaestheticprinciplesfollowedbythealgorithmare:
to exp ose hierarchical structure (if any) in the graph, to avoid edge crossings
(ifp ossible) andsharpb ends, to keep edgesshort, tofavorsymmetryand
bal-ance.Thelayoutalgorithmof thegraphviewofaweb siteisveryimp ortantto
understanditsstructure,esp ecially whenthesite isverycomplex.
Another problemconnected with thevisualization of awebsite is thefact
diÆ-b
b
b
b
Fig.6.Coloredgraph viewofthesitewww.ubicum.it/madmaxpubatdate
3-2-2000.
abstract, to simplify,to extract ap ortionof,or to seeonlyapart of thegraph
view.Anotherp ossibilitycanb etoaddotherviewsofawebsiteasforexample
thebirdeye andoverviewdiagramsor todisplayonlythedepth-rst tree
with-outreturn edges(solutionadopted in[9]). Theviews and facilitieswe prop ose
follow. The system view represents the organization of pages into directories;
thedataow viewdisplaystheread/writeaccessesofpagestovariables,resp
ec-tivelythrough incoming/outgoingedgeslinking pagesto variables; thehistory
representation with percentage bars describ es, incompactway, thep ercentages
ofno deswiththesamecolor.Amongtheprovidedfacilities,theviewersupp orts
zo om, search, deletion of incomingor outgoingedges, and fo cus. Thefacilities
forfo cusing onandsearching ano deare usefulwhenthevisualizedgraphs are
very large. By exploiting the fo cusing facility it is p ossible to display only a
limitedneighb orho o dofaselected no de.Anotherp ossibilitytoaccessthegraph
view,not yet implemented,is the identication and extraction of ap ortion of
a web site by means of pattern matching techniques. Recurrent patterns are
exp ected tob e usedinthedesignof websites (forexampletree,hierarchy,full
connectivity,indexed-sequence).
5 TestWeb
Web sites can involve a complex interaction among Web browser, op erating
systems,plug-inapplications,communicatingproto cols,Webservers,databases,
server programs (for example CGI programs) and rewalls. Such complexity
makes the test of Web Sites a great challenge [8].Ideally all comp onents and
the extreme timepressure under which Websystems are develop ed. Available
testingtechniques [8]dieronthefeatures oftheWebsitewewanttotest.For
example it is p ossible to execute link testing, HTML validation, p erformance
testing, andsecurity testing.We areinterestedindynamic validationusingour
UML mo delas abase for this typ e of testing. In general, dynamic validation
metho dsaimatexercising thesystembysupplyingavector ofinputdata(test
case)andcomparingtheexp ectedoutputswiththeactualonesafterexecution.
Inparticular,weconsidered whitebox testingofWebApplications:theinternal
structure ofaWebapplicationisaccessed tomeasure thecoveragethat agiven
testsuite(collectionoftestcases) reaches,withresp ect toagiventestcriterion
(statingthefeaturestob etested).Somewhiteb oxtestingcriteria,derivedfrom
thoseavailablefortraditionalsoftware[2],are:Pagetesting, Hyp erlinktesting,
Denition-use testing,All-uses testing,All-pathstesting. AtestcaseforaWeb
application is a triple: URL, input (a sequence of variable-value assignments
separated by the character '&'), typ e of parameter passing (GET or POST).
Execution consistsof requestingtheWebserver fortheURLinthetriplewith
theasso ciatedinputandstoringtheoutputpages.Satisfactionofanyofthewhite
b oxtesting criteria involves selecting aset ofpaths intheWebsite graph and
providinginput values. Since path selection is indep endent (conditional edges
excluded) frominputvalues,it canb eautomated.
Output
pages
Coverage
Pass/fail
model
UML
Test
cases
Executor
Test
values
Expected
Test Criterion
Generator
Test
dynamic pages
cond. edges
uses
Input values
Fig.7.Architecture oftheto olTestWeb.
TestWeb(see Figure 7) contains a test case generation engine (Test
gen-erator),ableto generate testcases fromtheUMLmo delof aWebapplication.
The user has to add some informationto the mo delpro duced by ReWeb to
complete it fortesting purp oses andfurthermore theuser has tocho ose atest
criterion. Theuser sp ecies the pagetyp e when thedistinction b etween static
anddynamicpagescannotb eobtainedautomatically(e.g.,dynamicpageswith
no input). The user also provides the set of used variables, use, for each
Additionalmanualinterventions,relatedtostate unrolling,willb edescrib ed in
thefollowingonanexample.GeneratedtestcasesaresequencesofURLswhich,
onceexecuted, grantthecoverageoftheselected criterion.Inputvaluesineach
URLsequence areleft emptybytheTest generator,and theuser hasto ll in
them,p ossiblyexploitingthetechniques traditionallyusedinblackb oxtesting
(b oundary values, etc.). TestWeb's Test executor can now provide the URL
request sequence of each test case to theWeb server, attaching prop er inputs
to each form. Theoutput pages pro duced bythe server are stored for further
examination.Afterexecution,thetestengineerintervenestoassessthepass/fail
result of each testcase. Asecond, numeric outputoftest case execution isthe
levelofcoverage reached bythecurrenttestsuite.
Regressiontestinghighlyb enetsfromtheautomationintestcaseexecution,
sinceeach testcasecanb ere-executed unattendedonanewversionoftheWeb
application, and its output pages can b e automatically compared with those
obtainedfromarun ofthepreviousversion.
5.1 TestGenerator
Given the graph representation of a Web application,a reduced graph can b e
computedforthepurp osesofwhiteb oxtesting:each staticpagewithoutforms
isremovedfromthegraphbyaCross-Termstepdescrib ed in[2](see Figure8).
A
D
B
C
E
A
A
B
B
B
B
D
D
C
E
C
E
A
Fig.8.StepoftheCross-Termalgorithmatano deselected forremoval.
Inthe resulting graph,a ctitiousentry no de is added, connected with all
no deswithnopredecessor,andactitiousexitno deisdirectlyreachablefromall
outputno des,i.e.,dynamicno deswithnonemptyuseattribute.Infact,theend
ofacomputationisreached,inaWebapplication,whensomeresultisdisplayed
totheuser,butnointrinsicnotionofterminationforanavigationsessionexists.
Dierentlyfromtheow-graphofastructuredprogram,thegraphview ofa
Webapplicationcancontainhorrible-lo ops[2],i.e.,theremayb eno desjumping
into or out of a lo op and/or there may b e more than one iterating no de for
the samelo op. In presence of horrible-lo opstheusual strategies used to cover
nested-lo opsandconcatenated-lo opsdonotwork.Wehavechosenageneral
so-lution:a testcase generation technique based on the computationof thepath
expression [2] ofthereduced Website graph.Apath expressionis analgebraic
representation of allpaths inagraph.Variablesina pathexpression are edge
ec-REDUCTION(graph)
1 Combine all serial links bymultiplying their expressions
2 Combine all parallel links by adding their path expressions
3 Remove all self-loops by replacing them this a link of the form
X
4
while(number of nodes in the graph
>2)
5
n
choose a node of the graph dierent from initial or nal node
6
Apply Cross-Term elimination to n
7
Combine any remaining serial links as in step 1
8
Combine all parallel links as in step 2
9
Remove all self-loops as in step 3
10
endwhileFig.9.Reductionalgorithm.
Computationofthepathexpressionforasitecanb ep erformedbymeansofthe
Reductionalgorithmdescrib ed in[2]anddepicted inFigure9.
Thelines 1-3of the Reductionalgorithminitializethepro cess and put the
graph in normal form. The b o dy of the commandwhile is executed until the
numb er of no des in the graph is greater than 2. Line 5 assigns ano de of the
graphdierentfromtheinitialornalno detovariablen,whileline6executes
aCross-Term step onno de n.This step eliminatesthe no den and transforms
the graphaccording to thediagramshown inFigure8. Lines 7-10combine all
serial andparallellinksandremoveself-lo ops. Attheend oftheexecution,the
path-expressionoftheinputgraphisobtained.
PATH GENERATION(path expression)
1
while
criterion not satised
2
for each
alternative from inner to outer nesting
3
choose
one never considered before, if any
4
or randomly choose
one
5
endfor
6
if
computed path increases coverage
then
7
add
it to the resulting paths
8
endif
9
endwhile
Fig.10.Theheuristictechniqueadopted toobtainthepathssatisfyinga
crite-rion.
Since thepath expression directly represents allpaths inthe graph, itcan
b e employed to generate sequences of no des (test cases) which satisfy any of
thecoverage criteria.Determiningtheminimumnumb erofpaths, fromapath
expression,satisfyingagivencriterionisingeneralahardtask.However,
heuris-ticscanb edened tocomputeanapproximationoftheminimum.Theheuristic
techniqueadopted forthisworkis basedontheschemeofFigure10(the
alter-native at line 2 for alo op is whether to re-iterate or not). Denition-use and
such as denition-use and all-pathstesting, for which thecoverage of p ossibly
innitepathsshould b eachieved,couldrequirethat onlyindep endentpathsb e
consideredor thatlo opsb ek -limited.
:Form
input = {va}
:Form
input = {va}
:Form
input = {va}
:Form
:Form
:ConditionalEdge
:Form
input = {va}
:Form
input = {jump}
thesaurus.htm
thesaurus
:Form
input = {va}
dictionary.htm
dictionary
:Form
input = {va}
:Form
a
b
c
e
d
input = {jump}
c = ’#entries(va) > 1 and
not (jump selected)’
Fig.11.Mo delofap ortionofwww.m-w.com,includingtwoconditionaledges.
Figure11showsthep ortionoftheWebsitewww.m-w.comwhichprovideson
line access to the Merriam-WebsterEnglishdictionary and thesaurus. A word
can b e entered inthe initialpage (eitherdictionary.htmor thesaurus.htm).
A dynamicpagedictionary isthen comp osedinresp onse to theinput word,
storedinvariableva.Thecontentoftheresultingpagedep ends onthenumb er
of entries foundinthe dictionary.If thereare morethan one entry,aselection
listisdisplayedtotheuser,togetherwithanexplanationofthemainentry.The
usercancho oseamongthealternatives{theselectionisstoredinvariablejump
{andmovetoapage,stillnameddictionary,withanexplanationofsuchan
entry. The mo del of the site represents the conditionalexistence of the list of
alternativeswithaConditionalEdgeobject asso ciated withtheedgelab elledb.
Thesiteoersthep ossibilitytoenteranewwordfromb oththedynamicpages
dictionaryand thesaurus,andallowsswitching fromdictionarytothesaurus
andviceversafromtheinitialandresult pages.
Inordertodetermineasetofpathstob eexercisedduringwhiteb oxtesting,
thepathexpressioniscomputed.Letusconsiderthep ortionofthesitedevoted
to the extraction of entries fromthe dictionary (symmetricconsiderations can
b e made for the thesaurus). The path expression asso ciated with the lab elled
edges of Figure 11 is a(bc+de)
. Some of thepaths generated fromit can b e
traversedonlyifprop erinputsareprovided,whilesomeotherpathsareinfeasible
for every input. For example, the path abc can b e traversed only if the input
wordhas morethan one entry in the dictionary. This condition is quite easy
selected fromthelistdisplayedtotheuser,thenext dynamicpagedictionary
that is obtainedwillnotinclude the listof alternative entriesanylonger. This
conditionisrepresentedas 'not (jump selected)'inFigure11:ifaselection
was p erformed by the user, edge b do es not exist. As a consequence the path
expressioncannotb eeasily exploitedforpathgeneration.
dictionary2
dictionary1
:Form
input = {va}
:Form
input = {jump}
:ConditionalEdge
:ConditionalEdge
:Form
input = {va}
a
b
c = ’#entries(va) > 1’
d
c
e
h
g
f
c = ’#entries(va) = 0 or 1’
Fig.12.Pagedictionarywasunrolled intodictionary1anddictionary2.
The problemhighlighted ab ove derives fromthe p ossibility to use asame
dynamic page for dierent purp oses. Actually, some Web sites consist of just
one dynamic page which displays dierent information according to an
inter-nallyrecorded state oftheinteraction.Inotherwords,whileforstaticsitesthe
Web page is coincident with the state of the interaction, this is not
necessar-ily true with dynamic sites. A p ossible solution to this problemis to p erform
an op eration of state unrollingon thedynamicpages that are used to display
dierentcontents underdierent conditions.IntheexampleofFigure 11, page
dictionary is used for two purp oses: to prop ose a list of alternative
dictio-naryentries and toprovidethenal result ofthesearch,once asingleentry is
identied. Such two purp oses mayb e represented explicitly by the two pages
dictionary1and dictionary2intowhichtheinitialpageisunrolled (see
Fig-ure 12). Theconditions intheConditionalEdge objects arenowsimpliedand
verify only the numb er of dictionary entries. The path expression of such site
p ortionb ecomes(ac+bf+bhg c)(dc+ef+ehg c)
andallthepathsthatcanb e
generatedfromitarefeasible,providedthataninputwordwiththeappropriate
numb erofdictionaryentriesis selected.
6 Conclusions and Future Work
Weprop osedsomeanalysisandtestingtechniquesworkingonWebapplications.
Thestartingp ointfortheirdenitionisamo delofWebsites,designedtoinclude
all characteristics that are relevant from an architectural p oint of view. Page
downloadingand mo del construction were achieved by providinginput values
Facilitiesforthedisplayoftheresultingmo delareprovided bytheanalysis
to olReWeb, whiletest generation and execution isautomated byTestWeb.
Ourexp erience withReWebandTestWebsuggeststhat thechoiceofago o d
mo del is fundamental for b oth analysis and testing. We showed some views
and facilitiesof ReWeb on a real example (www.ubicum.it).Path testing in
presence of aninternalrepresentation of theinteractionstate can b esimplied
bymeansofastateunrollingop eration,whichwasalsodescrib ed withreference
toarealworldexample(www.m-w.com).
Ourfuture work will b e devoted to extending theset of analyses available
(to include,for example, pattern matching),adding abstraction techniques to
supp ortahighlevelviewofthesite,(partially)automatinginputselectionduring
testing in presence of conditional edges and providing b etter supp ort to state
unrolling.
References
1. A. V. Aho,R. Sethi, and J.D. Ullman. Compilers.Principles,Techniques,and
Tools. Addison-Wesley Publishin g Company,Reading,MA,1985.
2. B.Beizer.SoftwareTestingTechniques,2ndedition.InternationalThomson
Com-puterPress,1990.
3. G.Bo o ch,J.Rumbaugh,andI.Jacobson. TheUniedModelingLanguage{User
Guide. Addison-Wesley Publishin gCompany,Reading,MA,1998.
4. J. Conallen. BuildingWeb Applicationswith UML. Addison-Wesley Publishing
Company,Reading,MA,2000.
5. D.Eichmann.Evolvinganengineeredweb.InProc.oftheInternationalWorkshop
onWebSiteEvolution,Atlanta,GA,USA,Octob er1999.
6. E.R.Gasner,E.Koutsoos,S.North,andKiem-PhongVo.Atechniquefordrawing
directedgraphs. InIEEE-TSE1993,March1993.
7. Chien-HungLiu,David C.Kung,PeiHsia,andChih-TungHsu. Structuraltesting
ofwebapplications . InProc.ofISSRE2000,Internationalsymposiumonsoftware
reliabilityengineering,SanJose,California,pages84{96,Octob er2000.
8. EdwardMiller. Thewebsitequality challenge.-companionpap er:"website
test-ing".InProc.ofQW1998conference,901MinesotastreetSanFrancisco,CA94107
USA,1998.
9. Rob ertC.MillerandKrishnaBharat.Sphinx:Aframeworkforcreatingp ersonal,
site-sp ecic web-crawlers. InProc.ofWWW7,BrisbaneAustralia,April1998.
10. F. Ricca and P. Tonella. Visualizati on of website history. In Proc. of the
In-ternationalWorkshop on Web SiteEvolution, pages 30{33, Zurich,Switzerland,
2000.
11. F.RiccaandP.Tonella.Websiteanalysis:Structureandevolution. InProceedings
oftheInternationalConferenceonSoftwareMaintenance,pages76{86, SanJose,
California,USA,2000.
12. P.Warren,C.Boldyre,andM.Munro.Theevolution ofwebsites. InProc.ofthe
InternationalWorkshopon Program Comprehension,pages 178{185,Pittsburgh,
PA, USA,May1999.
13. Y. Zou and K. Kontogiannis. Enabling technologies for web-based legacy
sys-tem integration. InProc.ofthe InternationalWorkshop onWebSite Evolution,