• No results found

Improving Web Performance by Client Characterization Driven Server Adaptation

N/A
N/A
Protected

Academic year: 2021

Share "Improving Web Performance by Client Characterization Driven Server Adaptation"

Copied!
12
0
0

Loading.... (view fulltext now)

Full text

(1)

Improving Web Performance by Client Characterization

Driven Server Adaptation

Balachander Krishnamurthy

AT&T Labs–Research

180 Park Avenue

Florham Park, NJ

[email protected]

Craig E. Wills

WPI

100 Institute Road

Worcester, MA

[email protected]

ABSTRACT

Wecategorizethesetofclientscommunicatingwithaserver ontheWebbasedoninformationthatcanbedeterminedby theserver. TheWebserverusesthe information to direct tailoredactions. Userswith poor connectivitymaychoose nottostayataWebsiteifittakesalongtimetoreceivea page,eveniftheWebserveratthesiteisnotthebottleneck. RetainingsuchclientsmaybeofinteresttoaWebsite. Bet-terconnectedclientscanreceiveenhancedrepresentationsof Webpages,suchaswithhigherqualityimages.

Weexploreavarietyofconsiderationsthatcouldbeused by a Web server incharacterizing a client. Once a client is characterizedas poor or rich, the servercan deliver al-tered content, alter how content is delivered, alter policy andcachingdecisions,ordecidewhentoredirecttheclient toamirrorsite. Wealsousenetwork-awareclientclustering techniquestoprovideacoarserlevelofclientcategorization anduseittocategorizesubsequentclientsfromthatcluster forwhichaclient-speci ccategorizationisnotavailable.

Ourresultsforclientcharacterizationandapplicableserver actionsarederivedfromreal,recent,anddiversesetofWeb serverlogs. Ourexperimentsdemonstratethat arelatively simplecharacterization policycanclassifypoorclientssuch thattheseclientssubsequentlymakethemajority ofbadly performingrequeststoaWebserver. Thispolicyisalso sta-bleintermsof clientsstayinginthesameclassfor alarge portionoftheanalysisperiod. Client clusteringcan signif-icantlyhelp ininitially classifyingclientsfor whichno pre-viousinformationabouttheclientis known. Wealsoshow thatdi erentserveractions canbeappliedtoasigni cant numberofrequestsequenceswithpoorperformance.

Categories and Subject Descriptors

C.2.2 [Computer Systems Organization]: Computer-Communication Networks|Internet; H.5.3 [Information Systems]: InformationInterfacesand Presentation|Web-based interaction

General Terms

Measurement,Performance

Keywords

clientconnectivity,clientcharacterization,serveradaptation

Copyright is held by the author/owner(s).

WWW2002

, May 7–11, 2002, Honolulu, Hawaii, USA.

ACM 1-58113-449-5/02/0005.

1.

INTRODUCTION

Webperformancehasbeenakeyfocusofresearchoverthe last fewyears. User-perceivedlatencyhasastrongbearing onhowlonguserswouldstayataWebsiteandthefrequency withwhichtheyreturntothesite. AWebsitethatistrying to retain users thus has a strong incentive to reduce the \time to glass" (the delay between the browser click and thedeliveryanddisplayoftheresourceontheuser'sscreen). ForWebsitesthathaveacriticalneedtoretainusersbeyond the rst page there is a strong motivation to deliver the content quicklyto theuser. Giventhevagaries ofnetwork delays, presenceof intermediaries, and the user's network connectivity,theserverhasastrongincentivetodeliverthe mostsuitable(dynamicallygeneratedorstaticallyselected) contentquicklytotheuser.

Our work focuses on learning about the quality of the connectionbetweentheclientandservertoaidtheserverin makinganinformed decisiononwhatcontenttoserve. We seektoobtaintheclient'sconnectivityinformationbasedon information alreadyavailableat theserver. Analternative to this passive approach is to actively gather information aboutthemoredynamiccomponentsoftheend-to-end de-lay, such as the bandwidth of the client or delays in the network. Thiswouldhoweverrequireconsiderableamount ofactiveinformationgatheringindi erentlocales.

A motivation for our work is to concentrate on perfor-manceissues that are notdueto the serveritself. Server-induceddelayscanbereducedbyimprovedscheduling poli-cies or simply upgrading the server. We focus on other reasonsbehindaclientexperiencingpoorperformancesuch as low bandwidth, high latency,networkcongestion,delay atintermediariesbetweentheclient andserver,andaslow client. Aservermaynotbeabletoisolatethereasonsfora givenclient experiencingpoor performance butit cantake remedial action inselecting a lower quality version of the resourceorbyservingthecontentinadi erentmanner.

A Web site may have multiple variants of the same re-source. IftheWebserveronthesiteknewthattheclienthad poor(orrich)connectivity,thenamoreappropriatevariant of theresource could besent intheresponse. Alternately, theservercoulddeliverthesamecontentinadi erentway. Sucharesponsemightresultintheuserbeingmoresatis ed with thedeliveryspeedor thequality ofthe response and helpretaintheclientfortheWebsite.

(2)

that canbesent backis also notlikelyto be morethana few. Bymappingtheversionsoftheresourceinadvanceto thedi erentcategoriesofaclients,theappropriateresponse canbesentshortlyafterclientcategorization.

Forexample,theservercanchoosebetweensendingonly thebasedocument,thebasedocumentplusafewembedded resources, or sending the full container document. Apart fromsimplysendingadi erentresponse,byidentifyingthe client,theservercanusetheinformationtoguideavariety ofitspolicies. Forexample,theservermightdecidetokeep aHTTPpersistentconnectionopenlongerwithclientswho havepoorconnectivitytoreducetheirneedtosetupanew TCPconnection. Or,theservercanpiggybackcacherelated informationtoreducetheneedforfuturevalidationrequests. Inthis work we explore the potentialapplicability of each oftheseactions.

Thusthechallengeistobeabletoclassifyclientsina sta-blemannerintoafewmeaningfulcategoriesandenumerate a set of ways in which the classi cation can be used for meaningfulimprovements.SinceagivenWebsite'scontent or access patternmay or may notlend itself to bene ting fromsuchaclassi cation,we rstneedtomeasurethe pro-portionofretrievalsthatcanbene t. Similarly,thenumber ofclientswhocanbene tisanothermetrictogather. Oncea clienthasbeencharacterized,weneedtomapittoan appro-priateactionfor the serverto take. Well-connectedclients mayrequirenochangeintheactionstakenbyaserver. Ad-ditionally,wecanexaminegroupingclientsforclassi cation andapplicationofserveractions. Forexample,groupingvia network-aware clustering[14]isatechniqueweexplore.

Thecontributionsofthispaperareasfollows:

1. We propose taking advantage of information already availabletoaWebservertoclassifyclientconnectivity.

2. Based onthe stability ofthe client classi cation and the degree to which it can be trusted, we propose a rangeofserveractionsthatcanbetaken.

3. Weevaluatethefeasibilityofourproposedclient clas-si cation policies and server actions by examining a heterogeneous collectionofWebserverlogs.

The rest of thepaperis organized as follows: Section2 discussesthevariouswaysinwhichwecancategorizeclients andtherangeofavailableinformationonwhichtobasethis decision. Section3 enumeratesthe rangeof serveractions possible for the di erent categories of clients. Section 4 presentsourmethodologyforcategorizingclientsandev alu-atingthepotentialapplicabilityofthevariousserveractions. Section5presentstheresultsofourexperimentscarriedout using a heterogeneous collection of large server logs. We concludewithadiscussionofrelatedwork,asummaryand plansforfuturework.

2.

CLIENT CHARACTERIZATION

The rststepinbeingabletoadaptcontentoritsmanner ofdeliveryfor aclientistoidentifytheclient's characteris-tics. Oneapproachisforclientsto identifytheir character-isticsthemselves. Clientsdothistoalimitedextentalready by specifying the contenttypestheyare willingto accept. Inaddition,clientscouldspecifytheirnetworkconnectivity

dicatetheirconnectivityinformationviatheCC/PP (Com-positeCapabilities/PreferencesPro les[8])mechanism. CC/ PP allows user agents and proxies to specify their capa-bilitiesandenablesHTTPcontentnegotiation(Section7.9 of [13]) with servers. The problem with theseapproaches is that even if available, many clients may not use them fornormalWebbrowsing. Inaddition,aclient'sexperience may varydepending onthe time of day or the numberof networkhopsbetweentheclientandtheserver.

Intheabsence ofexplicit classi cationinformation from the client, itis usefulfor a Webservertobe able to char-acterize a client. There are a number of potential pieces of information available to a server for such classi cation. We consider threetypesof classi cationbased onnetwork connectivity,responsetimeandotherconsiderations.

2.1

Network Connectivity

The rst type ofinformation involvescharacterizing the nature of thenetworkconnectivity betweena client and a server. Ideallytheserverwouldliketoknowtheround-trip time (RTT),bandwidth, and the amount ofcongestion in thepathtotheclient. Inpractice,theWebservercanonly makeinferencesbasedonthenetworktraÆcitreceivesfrom the client. For example, the servercan estimatethe RTT bymeasuringthetimebetweenacceptingaTCPconnection fromaclient andreceivingthe rstbyteof thesubsequent HTTP request. This interval requires a single IP packet round trip. Thisapproachissimpleandintroducesno ad-ditionalnetworktraÆc,althoughtypicalWebserverswould needtobeinstrumentedtomeasurethisvalue.

WhileeachTCPconnectionmadebyaclienttotheserver canbe usedtore nethe RTTestimationfor theclientby the server,it ismore diÆcultto estimatebandwidth char-acteristicsfor thenetworkconnection. Tools thatestimate bandwidth between two nodes typically work by sending packetsofdi erent sizesbetweenthe nodesandmeasuring therespectiveroundtriptimes. Examplesincludebing[5], pathchar[11]andmeasuringend-to-endbulktransfer capac-ity[2]. ThisapproachwouldaddoverheadforboththeWeb serverandtheclient.

AnotherapproachthattheWebservercanusetoestimate theRTTattheHTTP-levelistoinitiallyrespondtoaclient requestwithaHTTPredirectresponse(302 Found),which would typically cause the client to retrieve the redirected content. Thedi erencebetweenthe tworequesttimescan thenbeusedtoestimatetheRTTbetweensuccessiveHTTP requests. Unfortunatelythis approachintroduces an addi-tionalHTTPtransactionandfurtherlengthenstheresponse timefortheclient.

Adi erentapproachattheHTTPlevelusesthefactthat browsers typicallyautomatically requesttheset of embed-dedobjects for thepageafterloadingthe container object for aWebpage. Measuring the timebetweentheretrieval ofthebaseobjectandthe rstrequestedembeddedobject canbeusedtoestimatethetimebetweensuccessiveHTTP requests without necessitating additional requests or redi-rectionsolelyfor thepurposeofmeasuring.

2.2

Response Time

(3)

tofocusontheresponsetimeseenbytheclientandnotbe directlyconcernedwith the natureof the network connec-tion.Thisoutcome-orientedcharacterizationfocuseslesson thespeci ccausesofpoorperformanceandmoreon identi-fyingthesituationswhereitoccurs.

OneapproachforestimatingresponsetimeforaWebpage istoagainusethefactthattypicallybrowsersautomatically downloadembeddedobjectsonapage. Usingthis observa-tion,themeasuredtimebetweenservingthebaseobjectand thelastembeddedobjectontheWebpageapproximatesthe totalresponsetimeofthepagefortheclient. This measure-mentcanbedonewithoutintroducinganyadditional Web traÆc.

A moreaccurateestimateof thetotal delay experienced bytheclient(vs.theserver)canbeobtainedby instrument-ingapagewithJavascript,whichexecuteswithintheclient's browser,to measurethe totaldownloadtimeand reportit back to the Web server[22]. Unfortunatelythis approach requiresexplicitinstrumentationofpagesandintroduces ad-ditionalHTTPtraÆc combinedwiththe requirementthat browsers accept Javascript. TheWebservercanalso mea-surethe numberofconnection aborts and resetsfrom the clientwhichmaysuggestthepresenceofpoorly-connected, impatientclients[7].

2.3

Additional Factors for Characterization

Thereareseveralotherfactorsthatcouldbeusedin classi-fyingaclient. Informationintheclientrequestheadersuch asacceptedcontenttypes,theHTTPprotocolversion[12], andtheclientsoftware itselfcouldall be exploitedin clas-sifyingthecapabilities oftheclient. Onceaclienthasbeen classi ed,thisclassi cationcouldbe storedincookies gen-eratedbyaserverandincludedbytheclientinsubsequent requeststothatserver.

Browserorproxycachingcanalsoa ecttheclassi cation of a client. For example, if a client requests only a few of theembedded objects on aWeb pagethat is knownto havetensofembeddedobjects,areasonableinferenceisthat theclient or anintervening proxyhas many ofthe objects cached. Clients withor behindan e ective cache may be consideredricher ifmanyneededobjectsarealreadyinthe cache. Ontheotherhand,aproxyinthepathmayintroduce additionaldelayforrequestsfromaclienttotheWebserver andmakethatclientappearpoorer.

A nalconsiderationisidentifyingwhichclientstoclassify. Themostobviouscandidatesforclassi cationare thosefor whomresponsetimeisimportant|usersemployingbrowsers. In contrast, automated clients such as spiders should be ltered out and not be considered for classi cation. Re-cent workhasexamined thedetection of spidersat a Web server[14,3].

3.

POTENTIAL SERVER ACTIONS

There a numberof possible actions that a server could potentiallytakeaftercharacterizingaclient. Therearetwo broadclassesofactions: thosethatalterthecontent,which can be applied for either rich or poor clients; and those thatalterhowthecontentisdelivered,whichareprimarily applicableforpoorclients. Indecidingwhichactiontotake the servercan use characteristics of the client or consider characteristics of a client group, such as those formed by

3.1

Alteration of Content

Given arangeof content variants,aservercould choose alargerandpotentiallyenhancedvarianttoservetoricher clientsandareducedversionforpoorerclients. Theserver could provide areducedversion by includingfewer, if any, embeddedobjectsorbyincluding\thinner"variantsof em-beddedimages.

3.2

Selection of Replica or Mirror

The rstaction aserverperforms whenit receivesa re-questistodecideifitisgoingtobehandledonthemachine wheretherequestarrived.OftenbusyWebsiteshavealarge collectionofmachinesbehindswitches orredirectorswhere the content may be stored or generated. Popular search engines andother busynews sites usethis approach. The frontendservercanuseclient'sconnectivityinformationto guideselectionoftheback-endmachineifthecontentstored orgenerated inthosemachinesispartitionedonthis basis. Afterchoosingthepropermirror,themannerinwhichthe Request-URIismappedtoaspeci cresourcecanbe addi-tionallytailoredbasedontheclient'sconnectivity.

Additionally,sitesthatuseContentDistributionNetworks (CDNs), may have some resources delivered from mirror sites distributed on the Internet. Often the choice of a particular mirrorsiteis left to theCDN;however, the ori-gin servercanprovidehintsaboutthe client'sconnectivity whichcanbeusedbytheCDNinselectingthemirror.

3.3

Alteration of Meta-Information

Well-connectedclientsandproxiesaremorelikelyto pre-fetchor prevalidate documentsto lower user-perceived la-tency.Theheuristicassignmentofexpiration meta-informa-tion toresources canbe tailoredbyservers toensure that poorly connected clients can avoid needless validation of less-frequentlychangingresources. Origin serverscould is-sue hints via headers so that proxies between poorly con-nectedclientsandtheoriginservercanincreasethefreshness intervalfor resources. By providing information to caches alongthepath,theoriginservercanhelpguidecaching poli-ciesattheintermediaries.

As discussed in earlier work [15, 16, 10], hints can be providedbytheserveraboutrequestpatternsofresources. Groupingresourcesintovolumesandtailoringthemtosuit the client'sinterestscane ectivelyprovideuseful informa-tion. Now,anadditional factorcanbe usedby theserver indecidingthe setofhintsto bedelivered totheclient. A richly connectedclient might receive extended set ofhints while a poorly connectedclientmight receive amore suit-ablytailoredsetofhintsornohintsatall.

3.4

Altering Manner of Delivery of Content

(4)

multi-to reduce the size of the bundleand use adelta-encoding mechanism to prevent bundlingof objects already present intheclient'scache.

3.5

Guiding Policy Decisions at the Server

Oncea serverhas returnedaresponse, itsabilityto use connectivityinformationdoesnotend. InHTTP/1.1, con-nectionsbetweenaclient andaservercanpersistbeyonda singlerequest-responseexchange. TheHTTPprotocoldoes noto erany guidelinesondecidingwhenapersistent con-nectionshouldbeclosed(seeSection7.5.5of[13]). Aserver candecidewhento close apersistent connection based on avarietyoffactorssuchasfairness,potentialoffuture con-nectionsfromthesameclient,durationforwhichconnection was already open,etc. If aserverknows that aclient has poorconnectivity,itmightwishtokeeptheconnectionopen longerthanusualtoensurethattheclientdoesnotpaythe overheadofhavingto teardownand setupanew connec-tion.A richlyconnectedclientcoulda ordtheoverheadof settingup a new connection. Additionally,the server can assignahigherpriorityto poorlyconnectedclientssothat theirrequestsarehandledfastertoreduceoveralllatency.

4.

METHODOLOGY

RatherthaninstrumentaWebservertocharacterizeclients and evaluate the potential of taking di erent actions, the initialapproachwehavetakenistoanalyzethelogsfroma varietyofWebserversites. Byusinglogsfortheanalysiswe canexamineawidervarietyofsites. Wetakeadvantageof thefactthatclientbrowsersaretypicallycon guredto auto-maticallydownloadembeddedobjectsforacontainerpage. Wecanthususethedelaysbetweenwhenthecontainerpage andsubsequent embedded objects areservedas ameasure oftheconnectivitybetweenaclientandtheserver.

For our study, we assembled a heterogeneous collection ofrecentserverlogsrangingindurationandnumberof ac-cesses. Thedataincludedlogsfromauniversitydepartment, aspecializedmedicalsite,alargecommercialcompany,a re-searchorganization,andasportingsitemirroredinmultiple countries.

4.1

Log Analysis

Notallofthelogsweusedwereinthesameformat,but allofthelogscontainedthefollowing eldsfor eachlogged request: the requesting client (identi ed by IP address or stringrepresentation),theRequest-URI,theresponsecode, thedateandtimewhentherequestwascompleted,andthe responsesize. Inaddition,someof the logsalsocontained referreranduser-agent elds,whichspecifytheURIthat re-ferredthecurrentrequestandtheagentsoftwarethatmade therequest.

The rst step in our analysis was to use the Request-URItoidentifyrequestsindicatinganHTMLobjectorthe outputofaserver-sidescript. Wemarkedtheserequestsas container documentsand counted the number of requests foreachofthem. Wethendeterminedthesetofpagesthat accountfor70%ofaccesseswithinthelogtoconstitutethe popularsetofpagesatthatsite. Wefocusedfurtheranalysis onlyonthesepagessincethebene tsforourapproachare mostlikelytobeappliedtothemostpopularpages.

Afterpreprocessing thelogtoidentifythepopularsetof

quences ofrequests.Wede neasequenceasasetof consec-utiverequestsforabaseobjectand atleastoneembedded object fromthesame client withthefollowing characteris-tics:

 The rst(base)requestinthesequencereturnsHTML content based on Request-URIindicating an HTML objectortheoutputofaserver-sidescript.

 Eachsubsequentrequestinthesequenceindicatesthe Request-URIisforanimage,stylesheetorJavascript object.

 Ifthereferrer eldisavailableinthelogthenthe refer-rer eldfortheembeddedobjectsmustalsomatchthe URIofthebaseobject. Thisadditionalcheckhelpsto eliminatearelativelysmallnumberofsequenceswhere theembeddedobjectsdidnotmatchwiththebase ob-ject.

 Finally,wesetanarbitrarymaximumthresholdof60 secondsbetweenanyobjectandthetimethebase ob-jectis downloaded. Anysequencethatspansthis du-ration is already going to be classi ed as poor and makingthethresholdlargerincreasesthepossibilityof groupingrequestsforobjectsondi erentpageswithin thesamesequence.

Thisanalysisisnotguaranteedtoobtainallsequencesofa basepagefollowedbyitssetofembeddedobjects. For exam-ple,itwillnotworkforpageswithembeddedHTMLframes. Itmayalsofailwhenmultipleclientsbehindthesameproxy are makingrequests atthe sametime tothe server. How-ever, it is not critical that we identify all sequences, but merelyasuÆcientnumbertocharacterizeclients.The anal-ysisproducessequencessuchasthesampleshowninTable1 wheretheretrievalofabaseobjectof12221bytesisfollowed bythe retrievalof10 embedded objects overthe courseof 28seconds. Twooftherequestsresultedinaresponsecode of 304 Not Modified for which no content bytes were re-turned. While a ner granularity than one second would be preferable, this coarse granularity does provide enough information toidentifytherelative durationofasequence, whichisthefocusofourwork.

Table 1: Sample Sequence

Object Response Content Delay(insec.)

Number Code Size FromBaseObject

0 200 12221 -1 200 183 2 2 200 1577 2 3 304 0 3 4 200 3322 4 5 200 2133 4 6 200 898 7 7 200 2803 8 8 304 0 11 9 200 400 11 10 200 2803 28

(5)

ob-all spider requests that we can identify (e.g., Googlebot, Scooteretc.). However,subsequentanalysis shows such l-teringhaslittlee ectonthenumberofsequencesweidentify inalog.

Asanadditionalanalysistool,wealsoclusteredthesetof uniqueIPaddressesineachlogusingthetechniqueoutlined in[14]. This clustering was carried outbya small C pro-gramthat usesafastlongestpre xmatchinglibrary. This softwareiscapableofclusteringmillionsofIPaddresses us-ing a large collection of BGP pre xes (over 441K unique pre xes)inafewseconds. Incaseswherethelogsrecorded domain names versus IP addresses, we performed a DNS lookuponeachnamepriortoclustering. Incaseswherethe DNSlookup failed or the clustering technique was unable to cluster a client with other clients then that client was treatedasitsownclusterforsubsequentanalysis.

4.2

Client Characterization

Onceweidentifythesetofsequencesinalog,weusethe characteristics of thesequences for a client to characterize that client. We choose threesimple categories for charac-terizing aclient: poor, normaland rich. Weuse the term \poor"torefer toclientswhogenerallyexhibitlongdelays betweenretrievalofthebaseobjectandsubsequent embed-dedobjects. Incontrast, we use the term \rich" to refer to clients who consistentlyexhibit little delay between re-trievalofthebaseobjectandsubsequentembeddedobjects. Weusetheterm\normal"torefertoclientswhocannotbe classi edaseitherpoororrich.

Weusedtwometricsinclassifying aclient: thedelay be-tween the base object and the rst embedded object and the delay between the base object and the last embedded object in a sequence. The rst metricmeasures the time foraclienttoreceiveatleastpartofthebaseobject,parse it for embedded objects and for the server to receive and processtherequestforthe rstsubsequentobject. The sec-ondmetriccorresponds tohow long aclient mustwait for all of the embedded objects for a pageto bedownloaded. Whilethisvaluewillvaryaccordingtothesizeandnumber ofobjects,itisareasonablemeasureinidentifyingthedelay characteristicsofclients.

Foreachofthesetwometrics,wede nedcumulative val-ues E

first and E

last

to represent long-term estimates for these two metrics for each client. To both minimize the amount of information stored for each client and to give greaterweighttomorerecenthistory,wechosetousea ex-ponentiallyweightedmeanwherethevalueforE

first (E

last issimilarlyde ned)isgivenas:

Efirst= Efirst+(1 )Emeasured

whereEmeasured isthe current measurement for the delay betweenthebaseand rstembeddedobject. Thevalue is theweightingfactorand takesonvaluesbetweenzero and one. Inourstudyweexperimentwiththreevaluesof : 0.0, 0.3, and 0.7. Notethat inthe case where =0.0 onlythe lastmeasurementisusedinclassifyingtheclient.

Usingthesede nedvaluesfor E first

andE last

wede ne thresholdsforwhatidenti esapoorandrichclient. Build-ing onworkfrom [17], Nielsen suggests downloadtimes of greaterthan10secondscausesdiscontinuitiesforauser[21]. Bickfordsuggeststhatusersarenolongerwillingtowait

be-Ratherthande nesuch xedthresholdsinourstudy,weuse resultsofthesestudiesasguidelinesinidentifyingthreesets ofthresholdsforthede nitionofaclientexperiencingpoor performance(alltimesareinseconds):

1. ifEfirst>3orElast>5 2. ifEfirst>5orElast>8 3. ifE first >8orE last >12

Wede nedandexploredonlyonethresholdsetfor what de nes a client experiencing good performance and hence warrantsclassi cationasarichclient:

ifEf irst

<=1andElast<=2.

Weused thesethresholdsto de ne whethera client was poor orrich(ornormal ifitmetneithercriteria). Wealso exploredtwolevelsofcon dencewitheachpolicy. The rst con dencelevelwasmoreaggressiveinclassifyingclientsas rich or poor immediately after the rstsequence was pro-cessed for aclient. Thesecond con dence level was more conservativeinkeeping theclassi cationofaclient as nor-maluntilatleastsixsequenceswereprocessedfromaclient. Theuseofclientclusteringwasalsoexploredinourstudy where weuse the accessesfrom all clientswithin a cluster tocharacterizeaclusteraspoor,richornormal. Whenthe initialrequestsequenceisreceivedfromaclientthedefault istoalwaysclassifythatclientasnormal. However,incases where an initial sequence is requested by a client who is part ofaclusterseen earlier thenthecurrent classi cation forthatclusterisassignedastheinitialclientclassi cation. WealsorecognizethatabusyWebservercannot realisti-callystorestateaboutclientsoveralongperiodoftime. We usegarbagecollectiontoremovethestateandclassi cation for clients whenthe last pageaccessis morethanoneday oldinthelogs. Wenote,however,thatitmightmakesense toretaininformationlongeronaper-clusterbasis.

Fourparameters|theweightingfactor,thethresholdsfor de ning a poor client, the con dence level and the use of client clustering|werestudiedinthis work. Table 2 sum-marizesthe36combinationswestudied.

Table 2: SummaryofPolicyParameters

Parameter RangeofValues

Weightingfactor 0.0,0.3,0.7

Poorthreshold(Efirst,Elast) (3,5),(5,8),(8,12) Con dencelevelforclientaccesses 1,6

Useofclientclustering no,yes

4.3

Evaluation of Characterization Policies

(6)

as successful if a high percentage ofbad accesses ina log are done by clients classi ed as poor and a high percent-ageofgoodaccessesinalogaredonebyclientsclassi edas rich.Thehigherthesepercentages,themorepossibilitiesfor theservertotakeaction,althoughtheaggressivenessofthe classi cation mustalso be tempered by potentialnegative e ectsof incorrect classi cation. For example, serving en-hancedcontenttoaclientclassi edasrich,whoisactually not richmay lead to negative consequences. Inanalyzing variousclassi cationpolicies,wewillexaminethetradeo s betweenmoreandlessaggressiveclassi cationpolicies.

Second, we keeptrackof the identity ofclients who can bene t from our analysis. We estimate the percentage of clientswho arestable; i.e.,ones thateitherneverorrarely movebetweenthecategories. Themorestable the classi -cationforaclient,themorelikelyisitthataservercantake tailoredaction. Clientswithunstableclassi cations arenot goodcandidatesforserveractions.

4.4

Applicability of Potential Server Actions

In Section3we identi eda numberof potentialactions thataWebservercouldapplyifithascharacterizedaclient as poor or rich. In addition to usingthe Web serverlogs to classifyclients,we also identi edpoorly performing se-quencesanddeterminedwhichoftheseactionscouldbe ap-pliedtothesesequences.

A variety ofconsiderations shouldbetakeninto account byaserverbeforedecidingontheapplicability ofa partic-ular action to a sequence. Amongthe considerations are thelengthofthesequence(somesequencesmayhavetobe long beforetheyare considereduseful enough to triggera particularserveraction),thesize ofthefull container doc-ument, or thesize ofthe base page. InTable 3, we list a seriesofpossibleserveractionsandthesequencecriteriato beexaminedfor eachaction.

The rstactionofreducingtheamountofcontentcanbe appliedto any page, butis most likely to improve perfor-manceforpagescontainingmanyimagesorbytes. However, this actionmay notbe satisfactorybecause itchangesthe contentof thepage. Redirectingthe client toa mirrorvia HTTPorDNSredirectioncanalsobeappliedtoanypage, butitismoste ectiverelativetootherserveractionswhen thebadperformingsequenceisduetoalongRTTbetween client and server. Wethus ag all sequences witha large delayforthe rstobject.

Alteringthemeta-informationcanbeusedtoextendthe freshnessintervalorpiggybackvalidations[15]toreducethe number of unnecessary 304 Not Modified responses. We measurethe potential applicability of thisactionby deter-miningthenumberofbadperformingsequencesthatcontain any304 Not Modifiedresponse.

Thelasttwoserveractionsexaminechangingthe wayin whichthecontentis delivered. The rstapproachis to re-ducetheamountofcontentbycompressingit. Wemeasure itsapplicabilitybydeterminingthenumberofbad perform-ingsequenceswithalargenumberofbytes.Secondly,wecan altercontentdeliveryby improvingtheserverperformance ofhandlingmanyrequestsfromaclientthroughactionssuch aslongerpersistentconnectionsorincreasedscheduling pri-orityforthis clientattheserver. Other techniquessuchas thebundlingofcontentcanalsobeused.

5.

RESULTS

Tostudytheclientcharacterizationpoliciesand applica-bility ofserveractions,wecollected aheterogeneous setof recent server logsfrom eight di erent Webserversites for thestudy.Descriptionsfor theseWebsitesareasfollows:

 largesite.com|alargecommercialsite,

 research.com|acommercialresearchorganizationsite,

 wpi.edu|aneducationalsitefortheWPIcampus,

 cs.wpi.edu|asitefortheWPIComputerScience De-partment,

 bonetumor.org|a medical site devoted to bone tu-mors,and

 cricinfo.org|asportssitedevotedtocricketwithlogs from three mirror sites in Australia, U.K., and the U.S.A.

AlllogsarefromtheApril{October,2001timeframe ex-ceptforthecricinfo.orglogs,whicharefromJune2000. The duration, numberofuniqueclientsandnumberofrequests foreachlog areshowninTable4alongwiththenumberof sequences we were able to identify usingthe methodology describedinSection 4. The last columninthe table indi-cateswhether boththeRefererand User-Agent eldswere available for eachlog entry. We should note that we had additional sets of logs for thesesites for similardurations. Althoughwe showresults for onlyoneset,wedid analyze theothersetsandtheresultsweresimilartothosereported here.

5.1

Client Characterization

Usingtheselogs,we rststudiedtheclient characteriza-tionpoliciesdescribedinSection4.2. Whilewestudiedall policy parameters described inthis section for eachof the logs shown in Table 4, we focus our discussion of results to particular policy parameters and logs for clarity. For example, all results shownuse (5,8) i.e., the thresholds of Ef

irst

> 5seconds or Elast >8 seconds as de nitions for identifyingclientswithpoorperformance. Usingthresholds of(3,5) and(8,12)changesthenumberofpoor performing clientsidenti edineachlog, butdoesnotchange the rela-tive performance of thepolicies. Variations betweenother parametersandlogswillbenotedasappropriate.

5.1.1

Identifying Sequence Performance

(7)

Serveraction Natureofsequencestowhichtoapply Reducethenumberofobjectsorbytes Longsequenceorlargenumbertotalbytes Redirecttoreplicaormirror High rstobjectdelay

Altermeta-information Presenceofunnecessary304responses

Altercontentdeliveryviacompression Largenumbertotalbytes

Altercontentdeliveryby Longsequence

keepingsessionsopenlonger

Table 4: Statistics About theWeb SiteLogsUsedinthe Study

#ofUnique #ofRequests #ofPage Referer&User-Agent

Log Duration Clients (Millions) Sequences FieldsAvailable?

largesite.com 1wk 3385079 48.4 1210966 yes research.com 1mo 384679 5.2 250548 yes wpi.edu 1wk 59844 6.0 270749 no cs.wpi.edu 1mo 37433 1.0 63007 yes bonetumor.org 7mo 65397 1.7 143867 yes aus.cricinfo.org 1wk 25937 3.3 175496 no uk.cricinfo.org 1wk 76335 6.7 262254 no usa.cricinfo.org 1wk 38533 2.2 98908 no

performanceshowsalastembeddedobjectdelayofnomore than2seconds.

Asa baselinefor measuringthe success ofclassi cation, we rstexaminethepercentageofbadperformingsequences ineachlog. ThesecondcolumninTable 5shows the per-centage of bad performing sequences where the delay for downloadingthelastembeddedobjectisgreaterthan8 sec-onds. Thisvaluerangesfrom9.2%inthe cs.wpi.edulogto 38.5%intheUSAcricinfo.orglogs. BecauseitisdiÆcultto classifyclientsuntilatleastonesequencehasbeenaccessed byaclient,thethirdcolumninTable 5showsthe percent-ageof bad performingsequences from aknownclient who hasmadeatleastonepreviousrequest. The nalcolumnin Table 5shows thepercentage ofbad performingsequences fromaclientwho haspreviouslyrequestedat leastone se-quenceorisinaclusterforwhichanotherclientinthe clus-ter has previouslyrequested a sequence. Incases such as largesite.com,thedi erencebetweenthelattertwocolumns indicatesthatclientclustering canhaveasigni cant e ect ontheraisingthepotentialabilitytocharacterizeclients.

Table 5: Pct. ofAccesseswithBadPerformance

All FromKnown FromKnown

Log Bad Clients Clusters

largesite.com 30.5 13.6 26.5 research.com 10.2 4.8 9.0 wpi.edu 14.7 10.9 13.4 cs.wpi.edu 9.2 3.8 4.4 bonetumor.org 29.3 16.9 25.4 aus.cricinfo.org 37.3 29.7 35.4 uk.cricinfo.org 35.2 26.5 33.8 usa.cricinfo.org 38.5 25.1 34.9

As a similar baseline, Table 6 shows the percentage of goodperformingsequencesineachlog. Thesecondcolumn

quenceswherethedelayfordownloadingthelastembedded object isnomore than2seconds. The respective columns are similarlyde nedasthoseinTable 5andshowthat be-tween 28.8% and 78.8% of all accesses show good perfor-mance. Clusteringraisesthepercentageofgoodperforming accessescomingfromclientclusterswithrepeataccesses.

Table 6: Pct. ofAccesseswithGood Performance

All FromKnown FromKnown

Log Good Clients Clusters

largesite.com 38.3 24.3 33.0 research.com 72.7 39.3 64.9 wpi.edu 67.5 62.2 65.5 cs.wpi.edu 78.8 53.8 60.9 bonetumor.org 47.9 27.1 41.1 aus.cricinfo.org 28.8 27.3 28.3 uk.cricinfo.org 34.3 27.6 32.4 usa.cricinfo.org 34.5 28.7 32.0

5.1.2

Poor Client Characterization Policies

Giventhepercentageofbadandgoodaccessesineachlog, thebestpolicyforcharacterizingpoorclientsmaximizesthe numberof bad accessesassociated with poor clientswhile minimizing the numberof good accesses by these clients. We lookedat the range ofpolicies summarizedby the pa-rametersinTable2.Fortheinitialdiscussionofourresults weusethelargesite.comlogs.

(8)

0

10

20

30

0

10

20

30

40

50

Pct. of All Accesses with Bad Performance

Pct. of All Accesses with Good Performance

α

=0.7,cf=1

α

=0.7,cf=6

α

=0.0,cf=1+clusters

α

=0.3,cf=1+clusters

α

=0.7,cf=1+clusters

Best

Worst

Figure 1: Accesses from Clients Classi ed as Poor forSelected Policies (largesite.com)

Theinterior pointsinFigure1representresults for par-ticularpolicyparameters. The\ =0.7,cf=6"point inthe lower left corner is a relatively conservative classi cation policybecauseitrequires acon dence(cf)of atleast6 se-quenceaccessesfromaclient beforeitis willingtoclassify theclientaseitherpoororrich. Usingthispolicyonly3.4% ofall accessesare badand comefroma clientclassi ed as poorwhile2.1%ofallaccessesaregoodandcomefromsuch clients. The\ =0.7,cf=1"policyclassi esaclientassoon asonesequenceaccess fromtheclient hasbeenmade. Us-ing thispolicy only9.3% ofall accessesare bad and come fromaclientclassi edaspoorwhile5.7%ofallaccessesare goodandcomefromsuchclients. Other valuesfor these con dencelevelsshowsimilarresultsandare omittedfrom thegraphforclarity.

The remaining three points on the graph show results where the characterization for all clients within a cluster isusedtoinitializetheclassi cationfor anypreviously un-seenclientwithinacluster. Resultsfor allvaluesof with acon dencelevelofoneaccessneededforclassi cationare shown. These resultsshowamuchhigherlevelofaccesses thatarebadandmadebyclientsclassi edaspoor,butalso anincreaseinthenumberofaccessesthataregoodbythis set of clients. Using =0.7, 19.1% of all accesses have a delay for the last embedded objectgreater than8 seconds withtheseaccessesdone byaclientcharacterizedas poor. Inthese cases the servercould potentially take an action, suchasloweringthenumberofembeddedobjects,toreduce thistime. Ontheotherhand,usingthissame value,8.3% ofallaccesseshaveadelayoflessthantwosecondsforthe last embedded object and were madeby aclient classi ed aspoor.

The intention of showing these di erent policy parame-tersis nottoshow whichisbest,buttoshowtherangeof tradeo sthatexist. Thecorrecttradeo dependsonwhatis importanttothesiteandwhatarethenegativeimplications ofan incorrectclassi cation. If asite would like to retain clients who have poor connectivity to the serverthen the sitemightbe moreaggressive inclassifying clients, partic-ularlyifthe actionstakendonothave anovertlynegative e ectonaclient whomightbe misclassi ed. Otherwise,a

clientsthentheideaofclientclassi cationmaynotbeuseful atallforthesite.

TheresultsshowninFigure1doindicatethatbasing ini-tialclassi cationdecisionsonaccessesfromacrossacluster ofclientsisworthwhile. Moreaccesseswithbadperformance areassociatedwithclientsclassi edaspoor whiletheratio ofbad togoodperformingsequencesis aboutthe same,or abitbetter,thanwhenclusteringisnotused.

5.1.3

Poor Client Characterization

Figure1showsresultsfor di erent policyparametersfor asinglelog. Table 7shows resultsfor all logswithfor the policyparametercombinationof =0.7andaneeded con -dencelevelofoneaccesswithclustering.

Thesecondcolumnisthepercentageofaccesseswithbad performancefromclientsclassi edaspoorwiththerelative percentageofallaccesseswithbadperformancein parenthe-ses. Theresults for thelargesite.com andcricinfo.org sites showthe bestperformance withmorethan60%of all bad performing accesses from clients classi ed as poor. These are opportunities for the serverto take mitigating action. The research.com, wpi.edu and cs.wpi.edu sites generally have better connected clients (many on-site users for the wpi logs) so classi cation ofpoor clients isexpectedto be lesssuccessful.

The last column in Table 7 re ects the potential nega-tivee ectofmakinganincorrectclient classi cation. This columnshowsthepercentageofaccesseswithgood perfor-mancemadebyaclientclassi edaspoor. Therelative per-centage oftheseaccesses comparedtoall good performing accesses (second column inTable 6) is given in parenthe-ses. Theresults show2.4{10.6% of all accessesfall inthis categoryforwithrelativepercentagesfrom3.0{36.1%.

Table 7: Accesses by Client Classi ed as Poor ( =0.7, cf=1with clustering)

Bad Good

Performing Performing

Accesses Accesses

Log (%AllBad) (%AllGood)

largesite.com 19.1(62.6) 8.3(21.7) research.com 3.4(33.3) 4.5(6.2) wpi.edu 5.8(39.5) 7.4(11.0) cs.wpi.edu 1.8(19.6) 2.4(3.0) bonetumor.org 14.8(50.5) 9.5(19.8) aus.cricinfo.org 25.7(68.9) 10.4(36.1) uk.cricinfo.org 25.0(71.0) 10.6(30.9) usa.cricinfo.org 26.3(68.3) 9.8(28.4)

5.1.4

Rich Client Characterization

(9)

Table 8: Accesses by Clients Classi ed as Rich ( =0.7,cf=6 withclustering)

Good Bad

Performing Performing

Accesses Accesses

Log (%AllGood) (%AllBad)

largesite.com 3.5 (9.1) 0.3(1.0) research.com 19.0 (26.1) 0.6(5.9) wpi.edu 29.1 (43.1) 2.1 (14.3) cs.wpi.edu 26.0 (33.0) 0.4(4.3) bonetumor.org 5.8(12.1) 0.6(2.0) aus.cricinfo.org 2.4 (8.3) 0.3(0.8) uk.cricinfo.org 4.4(12.8) 0.4(1.1) usa.cricinfo.org 3.7(10.7) 0.3(0.8)

The meaning of results shown in Table 8 mirror those shown in Table 7. The second column in Table 8 is the percentage ofaccesses withgood performance fromclients classi edas richwiththerelativepercentageofall accesses with good performance in parentheses. The WPI and re-search.comresultsshowthebestabsoluteandrelative per-formance.ThelastcolumnofTable8showsthepercentage ofaccesses withbad performance from clientsclassi ed as rich.Asdesiredtheseabsoluteandrelativepercentagesare low.

5.1.5

Per-Client Characterization

Foragivenpolicy,eachclient isinitializedtothenormal class unless access information is known for other clients withinthesameclusterinwhichcasetheclientinheritsthe class of its cluster. Each client is potentially reclassi ed aftereachsequenceaccess. Foreachpolicy,wecountedthe numberofclientsineachclassattheendofprocessingeach log. We also kept track of the class for clients who were inactive for over a day and whose state information was removed.

Table 9shows the percentage of distinct clientsin each classfor the largesite.com logs. The policy parameters of =0.7,cf=1withclusteringareusedfortheresults. Clients areclassi edaccording tothe numberofsequenceaccesses theymadeinthelog. Becauseclientinformationisdiscarded afteradayofinactivity,accessesfromthesameclient sep-aratedbymorethanadayareclassi edasseparateclients.

Table 9: Percentage of Distinct Clients In Each Class(largesite.com)

Numberof ClientClass

ClientAccesses Poor Normal Rich Total

1-4 15.2 73.0 6.9 95.0 5-9 1.6 1.6 0.6 3.8 10-14 0.2 0.3 0.1 0.5 15-19 0.1 0.1 0.0 0.2 20-24 0.0 0.1 0.0 0.1 25+ 0.1 0.2 0.0 0.3 Totals 17.2 75.2 7.6 100.0

Theresultsshowthatthevastmajorityofclientscoming

classi edasnormal. Abouttwiceasmanyoftheremaining clientsareclassi edaspoorcomparedtorich. Asacontrast, usingthesamepolicyparametersfor theresearch.comlogs shows 78% of clients classi ed as normal, 18% as rich as the remaining 4% as poor. The di erence simplyre ects di erent characteristics amongst the set of clients visiting thesesites.

Wealsoexamined theclientcharacterization policiesfor theirstabilityintermsofthefrequencythattheclientclass changed. We found that for 72.5% of the largesite.com clientsstayedinthesameclassas itsinitial class. Inmost cases,theclientwasinitializedtonormalandstayedinthat class,butifclusterinformationwasavailablethentheclient may have been initialized to the poor or rich class. For 18.6%ofclients,therewasat leastonetransition fromthe poorclasstoanotherclassorfromanotherclasstothepoor class. Given that 17.2% of all clients nished inthe poor state (shown in Table 9), these results indicate that the classi cationisstable. Infactonly1.2%ofclientshadmore thantwotransitionsinoroutof thepoorclass. Similarly, 9.8%ofclientshadatleastonetransitionintooroutofthe richclass. Giventhat7.6%ofallclients nishedintherich class, these results also indicate stability of the classi ca-tion. Only1.1%ofclientshadmorethantwotransitionsin oroutoftherichclass.

5.2

Applicability of Potential Server Actions

InadditiontousingtheWebserverlogstoclassifyclients, we also usedthe logs to identify sequences exhibiting bad performanceanddeterminewhichoftheseactionscouldbe applied to improve performance. For this portion of the studyweanalyzedthecharacteristicsofallsequenceaccesses inwhichthedelayforthelastobjectretrievalwasmorethan 8 seconds. We sought to matchthe sequence characteris-ticsidenti edforpossibleserveractionsinTable3withthe characteristicsof accesseswithbad performance inthe set of logs. We considered all suchaccessesand did not limit ouranalysistoonlythoseaccessesfromclientsclassi edas poor.

Thecriteriatodeterminewhichsequencesareappropriate forwhichserveractions willvaryaccordingtothedi erent administrative policies at each site. For purposes of com-parisonsbetweendi erenttypesofsitesanddi erentserver actionsweapplyacommoncriteriatoallsites,butthis ap-proachisusedonlytounderstandthepotentialapplicability ofdi erentserveractions.

Table10showsthepercentageofsequenceswithbad per-formanceforwhicheachtypeofserveractionisapplicable. Table5hastheresultsonthepercentageofsequenceswith badperformanceineachlog.

(10)

Redirect Alter Compress Alter Alter

to Meta- Content Object Page

Log Replica Information Delivery Delivery Contents

largesite.com 42 25 59 67 72 research.com 38 16 42 26 52 wpi.edu 26 43 54 56 71 cs.wpi.edu 31 17 50 29 68 bonetumor.org 37 10 83 3 84 aus.cricinfo.org 47 45 9 32 35 uk.cricinfo.org 59 39 10 20 23 usa.cricinfo.org 53 27 15 24 28

ofallsequenceswithbadperformancethathaveadelayfor the rst objectgreater than5 seconds. Thepercentage of sequences is greater than 25% in all cases and shows the largestpercentageacrossallpotentialserveractionsforthe cricinfo.orglogs.

Notethat theinitial delaywouldnot includetimespent byaservertogeneratethecontent becausethetimestamp onthebaseobjectisthetimeatwhichitscontentiswritten to the client. It is possible that a large base page could causealonger initialdelay asthe client hasmorebytesto receiveandprocessbeforeitcanrequestsubsequentobjects. Forexampleinthe largesite.comlogs,37%ofallsequences withbadperformancehada rstobjectdelay greaterthan 5secondsandabasepagelargerthan10Kbytes.

Thenextcolumninthetable(alteringmeta-information) indicatesthe percentage of sequencesthat include at least one304 Not Modified response. These requests could be eliminated by increasing the freshness lifetime for objects or allowing clients to piggyback validation requests onto otherrequests. Theresults showthat this approachcould beofmoderateusefulness,particularlyforthewpi.eduand cricinfo.orglogs.

It is diÆcult to apply aprecise criterionfor when com-pression is appropriate. We selected a value of 40K total bytes ina sequenceasthe thresholdfor asequencewitha large numberof bytes. This is approximately the median value for total bytes in a sequence across all logs in our study. Themediansforindividuallogsrangedfrom15{65K bytes. The fourth columninthe table shows the percent-ageofsequenceswithbadperformancethathavemorethan 40K totalbytes. Thebonetumor.orglogshavethehighest percentageof largesequences. Allbutthe cricinfo.orglogs haveasigni cant numberofsequencesmatchingthis crite-ria. Adirectionoffutureworkisfurtherdistinctionbetween text andimage objects as image datacannotbe generally compressedasmuch.

Thenextcolumninthetableidenti esthepercentagesof sequenceswithbadperformancethathavealonglistof ob-jectsinthesequence. Thehandlingoftheselongsequences can be altered by bundling objects together. Alternately aservercanimprovethe e ectivenessof handlingeach re-questbyextendingtheconnection lifetimeorgiving higher scheduling priority for these clients. The results shown in thetablearethepercentageofsequenceswithatleast10 ob-jects. Thisisapproximatelythemedianlengthofsequences withpoorperformanceacross alllogs. Themediansfor in-dividuallogs are between 4and 14. The results show the

mostapplicabilityofthisserveractionforthelargesite.com andwpi.edulogs. Thebonetumor.orglogshavevirtuallyno longsequences.

Incontrasttoactionsthatdonotalterpagecontents,the last columninthetableshowsthepercentage ofsequences withbadperformancethatcouldbeimprovedbyservingan altered(bysizeandnumberofobjects)Webpage. This ac-tioncouldbeappliedtoallsequences,butismostapplicable tothosewitheitheralargenumberofobjectsortotalbytes. Thepercentagesinthetableare forallsequenceswithbad performancethathaveatleast10objectsormorethan40K totalbytes. Thisactioncouldbeappliedtothemajorityof sequencesinalllogsexceptfor thecricinfo.orglogs.

Additionalworkisneededto identifyappropriateserver actionsforasequencewithbadperformance. Beyond eval-uatingapplicabilitywehavetoexaminee ectiveness. How-ever,theresultsdoindicatethateachofthepotentialserver actionscouldbeappliedtoasigni cantnumberofsequences atleastonesite. Inaddition,thevarietyofserversitesyield di erentserveractionsthatwouldbemostapplicable.

5.3

Results Summary

Thereare numberofresultswewishtosummarizeabout ourbasicpremisethatclientscanbeclassi edbasedon char-acteristicsdeterminedbyaserverandthatarangeofserver actions couldbeappliedinthecase ofaclientclassi ed as poor.

 A relatively simple classi cation policy cancorrectly classify clients as poor so that the majority of se-quenceswithbadperformancecomefromtheseclients.

 Client clustering improvesthenumberof clientswho canbeinitiallyclassi edbasedonclusterinformation inmost logsand signi cantlyimprovesinsome logs. There are not only more classi cations, but the ac-curacyof theseclassi cations is comparableto when informationforonlyasingleclient isused.

 Therearetradeo sfortheaggressivenessofclient clas-si cation. Thenegativeimplications ofawrong clas-si cation should determine the aggressiveness of the policytouse.

(11)

 Clientclassi cationworksbestforsiteswithavariety ofclients. Classi cationofpoorclientsdoesnotwork wellwhenthereare relativelysmallnumbersof poor clientsvisitingthesite.

 Client characterization is stable for particular clients withfew transitionsbetweenclasses observedbeyond initialtransitionsforaclient.

 Not a singletype ofserver actionis most applicable for all sites and eachserveraction is applicable to a signi cantnumberofsequenceswithbadperformance onatleastonesite.

6.

RELATED WORK

Theconcept of dynamicallyalteringmultimediacontent ina Web pagedepending on network path characteristics wasreportedinarecentlyissuedUnitedStatespatent[19]. Inthisproposedscheme,aWebserverwouldmonitora va-riety of network characteristics (such as round trip time, packetloss) between itself and the client and correspond-inglyadjustthequalityofthecontentreturnedtotheclient. However,thispatentdealsexclusivelywithalteringthe con-tent or its delivery. It does not coverthe range of other serveractions thatare partof ourapproach. Additionally, thenetworkaware clustering we use to construct acoarse grouping of clients has been shown to be superior to the \nearness"approachdiscussedin[19]. Wearenotawareof anypublishedresearchontheideaproposedinthispatent. Rather thanlet the serverestimate aclient's character-istics, otherapproaches can be used. Explicit client spec-i cationof networkconnectivity is usedinmany multime-diaplayers. TheSPANDsystemusesanapproachwherea groupofclientsestimatecosttoretrieveresourcesandshare itamongstthemselves[23].

There is related workin adaptingthe content servedto usersbasedonserverload[1]. Asaserverbecomesloaded it begins to degrade the content served to lower-priority clients. In comparison with our work, server load is the onlyperformance measure that triggers an actionand the onlyactionconsideredistoservedegradedcontent.

TheApacheWebserver'scontributedcollectionof plug-insincludeaplug-incalledApache::Throttle [24]that im-plementscontentnegotiationbasedonthespeedofthe con-nection with the client. For example, poorly connected clients can receive lower resolution images. Connectivity ismeasuredviaahighresolution timerby notingthetime betweenthebeginningandtheendofsendingtheresponse totheclient attheapplicationlevel.

Recentworkalsoseeksto estimatetheservicetimefora clientthroughexaminationofserverlogs[4]. Howeverthis work focuses onhow to more precisely estimate the total time by instrumentingan Apacheserver to log additional data. This information is usedto analyze how the size of thewrite bu erat the serverin uencesperformance mea-surements. This worktries to makea moreaccurate esti-mationofclient responsetimethanourwork,butusesthis informationforamuchmorenarrowpurpose.

7.

SUMMARY AND FUTURE WORK

We have outlined a way to identify client's characteris-tics automatically by passive analysis of Web server logs. Evenoursimpleset ofcategories (poor,normal, and rich) canbeusedtodriveawiderangeofpossibleserveractions. Thecharacterization doesnothavetobefoolproofandour methodologyallowsmoreliberalandconservative classi ca-tion approaches. We providea mapbetween classi cation and the kindof server action that might be most appro-priate for a client. Althoughourclassi cation is basedon examiningsequencesofaccesses,weareabletoidentify in-dividualclient'sstabilityduringtheanalysis: amorestable client staysintheidenti edcategoryforalargeportionof the analysis period. Thestability of a client can help the servertodecideifitisanappropriatecandidatefortailoring serveractions.

We have explored applicability of our methodology by conductingarangeofexperimentsusingalargenumberof realandextensiveserverlogs. Thelogswereobtainedfrom diverseorganizationssuchas auniversity,alarge corpora-tion, and apopular sports sitemirrored in multiple coun-tries. Wedemonstratedstabilityofourclient characteriza-tion techniqueineachof the logs. Weestimatedthat sig-ni cant portions of sequenceswith badperformance could bene tfromsomeserveraction.

Ourmethodology isextensible;othertechniquesto char-acterizeclientscouldbeincorporatedtoincreasecon dence intheclassi cationandserveraction. Theuseof network-aware clustering to coarsely groupclients is also a helpful aid inthecharacterization. Ournovelproposalcanbe tai-loredtoanindividual Website, appliedautomatically,and validated toexamineifthe serveractionsresult inspeci c improvements.Theamountofstatethatneedstobe main-tainedcanbealtereddependingontheimprovements.

Whilethisworkhasexaminedthepotentialfeasibility of client characterization and serveradaptation, it leadsto a numberofdirectionsforfuturework. First,weplanto exam-ineawiderrangeofthresholdsandpoliciesforcategorizing richandpoorclientsaswellaswhatconstitutesgoodorbad performance. Second,we intendto validateour characteri-zationresultstoseeifpoor/richclientsarereallypoor/rich byactive measurementsfromknownclientsindi erent lo-cationstoaserverunderourcontrol.

Third,weplantoinstrumentarealWebservertostudy thepracticalityofaservermaintainingsuchinformationfor clientsandusingtheinformationtomakedecisionsin serv-ingcontent. Weneedtobetterunderstandthememoryand computationaloverheadofour approach. Finally,weneed tostudynotjusttheapplicabilityofvariousserveractions, but more importantly their e ectiveness inreducing user-perceivedlatency.

Acknowledgments

(12)

8.

REFERENCES

[1] TarekF.AbdelzaherandNinaBhatti.WebServer QoSManagementbyAdaptiveContentDelivery.In Proceedings oftheInternationalWorkshoponQuality ofService,London,England,June1999.

http://www.eecs.umich.e du/ ~zah er/ iwqo s99 .ps.

[2] MarkAllman.Measuring end-to-endbulktransfer capacity.InProceedingsoftheACMSIGCOMM InternetMeasurement Workshop,SanFrancisco,CA, November2001.

[3] V.Almeida,D. Menasce,R.Riedi,F.Peligrinelli, R.Fonseca,andW.MeiraJr.Analyzingtheimpactof robotsonperformanceofWebcachingsystems.In Proceedings ofthe6thInternationalWeb Caching WorkshopandContentDeliveryWorkshop,Boston, MA,June2001.

[4] O.Ardaiz,F.Freitag,andL.Navarro.Estimatingthe timeofserviceinaWebclientstartingfromthe serverlogs.InProceedings oftheACMSIGCOMM AmericaLatinaConference,SanJose,CostaRica, April2001.ACM.

[5] PierreBeyssac.BING:BandwidthpING,March1998. http://www.cnam.fr/rese au/ bing .ht ml.

[6] PeterBickford.Worththewait? ViewSource,Human InterfaceOnline,1999.

http://devedge.netscape .co m/vi ews ourc e/ bickford_wait.htm.

[7] RamonCaceres,FredDouglis,AnjaFeldmann, GideonGlass,andMichaelRabinovich.Webproxy caching: thedevilisinthedetails.InWorkshopon InternetServer Performance,Madison,Wisconsin USA,June1998.

[8] CC/PPWorkingGroup. http://www.w3.org/Mobil e/C CPP/ .

[9] WillyChiu.Bestpracticesforhighvolumewebsites, February2001.

http://www.worldinterne tce nter .co m/Ot her _Eve nts / Challenge-The-Expert/Fe b28 mega web site / SVWIC_Chiu_2_28_01.pdf.

[10] EdithCohen,BalachanderKrishnamurthy,and JenniferRexford.ImprovingEnd-to-EndPerformance oftheWebUsingServerVolumesandProxyFilters. InProceedingsoftheACMSIGCOMM'98

Conference,September1998.

http://www.acm.org/sigc omm /sig com m98/ tp/ abs_ 20. html .

[11] AllenB.Downey.UsingpathchartoestimateInternet linkcharacteristics.InProceedingsof theACM SIGCOMM'99Conference,Cambridge, MassachusettsUSA,September1999.ACM.

[12] BalachanderKrishnamurthy,Je reyC.Mogul,and DavidM.Kristol.KeyDi erencesbetweenHTTP/1.0 andHTTP/1.1.InProc.EighthInternationalWorld WideWeb Conference,Toronto,May1999.

http://www.research.att .co m/~b ala /pap ers /h0v h1. html .

[13] BalachanderKrishnamurthyandJenniferRexford. WebProtocols andPractice: HTTP/1.1,Networking Protocols,Caching,andTraÆcMeasurement. Addison-Wesley,May 2001.ISBN0-201-710889-0.

[14] BalachanderKrishnamurthyandJiaWang.On network-awareclusteringofWebclients.In Proceedings ofACMSIGCOMM,August2000.

[15] BalachanderKrishnamurthyandCraigWills.Studyof PiggybackCacheValidationforProxyCachesinthe WorldWideWeb.InUSENIXSymposiumonInternet TechnologyandSystems,pages1{12,December1997. http://www.research.att. com /~ba la/ pape rs/

pcv-usits97.ps.gz.

[16] BalachanderKrishnamurthyandCraigE.Wills. Piggybackserverinvalidationforproxycache coherency.InSeventhInternationalWorldWideWeb Conference,pages185{193,Brisbane,Australia,April 1998.

[17] RobertB.Miller. Responsetimeinman-computer conversationaltransactions.InProc.SprintJoint Computer Conference,Montvale,NJ,1968.AFIPS Press.

[18] J.Mogul,B.Krishnamurthy,F.Douglis, A.Feldmann,Y.Goland,A.vanHo ,and

D. Hellerstein.DeltaencodinginHTTP.RFC3229, IETF,January2002. ProposedStandard.

[19] Je reyC.MogulandLawrenceS.Brakmo.Method for dynamicallyadjustingmultimediacontentofa Webpagebyaserverinaccordancetonetworkpath characteristicsbetweenclient andserver,June2001. UnitedStatesPatent6,243,761.

[20] Je reyC.Mogul,FredDouglis,AnjaFeldmann,and BalachanderKrishnamurthy.Potentialbene tsof deltaencodinganddatacompressionforHTTP.In ProceedingsoftheACMSIGCOMM1997Conference, pages181{194,August1997.

http://www.acm.org/sigco mm/ sigc omm 97/p ape rs/p 156 .htm l.

[21] JakobNielsen.Designing WebUsability.NewRiders, 2000.

[22] RamRajamonyandMootazElnozahy.Measuring client-perceivedresponsetimeontheWWW.In USENIX SymposiumonInternet Technologyand Systems, SanFrancisco,California,USA,March2001.

[23] SrinivasanSeshan,MarkStemm,andRandyH.Katz. SPAND:sharedpassivenetworkperformance

discovery.InUSENIX SymposiumonInternet TechnologiesandSystems, Monterey,California,USA, December1997.

[24] Apache::Throttle-Apache/Perlmodulefor speed-basedcontentnegotiation.

http://www.dmi.usherb.ca /la bora toi res/ documentations-logiciels /Pe rl/l ib/ Apac he/ Throttle.html.

[25] CraigE.Wills,MikhailMikhailov,andHaoShang.N for thepriceof1: BundlingWebobjectsformore eÆcient contentdelivery.InProceedingsof theTenth InternationalWorldWideWebConference,Hong Kong,May2001.

References

Related documents

We assume that the provided graph is simple and weakly triangulated and the given augmented_embedding structure represents a valid planar embedding o f the

Thus, production of phenolic intermediate metabolites and phenolic acids were detected earlier during the incubation and reached higher concentrations in the

Objective : to build and reinforce the growth and sustainability of rural micro and small businesses and to support initiatives that will lead to farm diversification and a

The objective of a graduate educational program in medical physics is to provide its graduates with the basic and applied scientific knowledge that is necessary both for

■ San Antonio’s hospitals, clinics, federally qualified health centers, mental health providers, private sector physicians, public health departments, and community based

✤ IOKit can share data directly with user space apps. ✤ Assume user space apps know the structure

• STEM (science, technology, engineering, and math) program delivery or STEM staff professional development • Expanded or extended learning time and the OST hours. •

2008 Cinéma Revisité, Les Rencontres Internationales, Centre Pompidou, Paris Documental: Contemporary Video Art from Europe, Guggenheim Gallery, L.A./ U.S.A. Marler Video