• No results found

Client URL. List of object servers that contain object

N/A
N/A
Protected

Academic year: 2021

Share "Client URL. List of object servers that contain object"

Copied!
12
0
0

Loading.... (view fulltext now)

Full text

(1)

Using the Domain Name System

Jussi Kangasharju, Keith W. Ross

Institut Eurecom

Sophia Antipolis, France

fkangasha,[email protected]

James W. Roberts

France Telecom { CNET

Issy les Moulineaux, France

Abstract

Inordertoreduceaveragedelayand

band-width usage in theWeb, geographically

dis-persed servers often store copies of popular

objects. Forexample,with network caching,

the origin server stores a master copy of

theobjectandgeographicallydispersedcache

servers pull and store copies of the object.

With sitereplication, objectsstored at

mas-terarereplicatedintosecondarysites. Inthis

paperweproposeanewnetworkapplication,

LocationDataSystem(LDS),that allowsan

arbitrary hostto obtainthe IP addresses of

the servers that store a speci ed URL. Our

networkingapplicationisanextensiontothe

Domain Name System (DNS), requires only

small changes to the domain name servers,

and can be deployed incrementally. For the

case of network Web caching, we elaborate

on our proposal to allow a cache to (i)

up-dateadistributeddatabasewhenitstoresor

evictsobjects,and(ii)pushobjectstoparent

caches in order to improve delay and

band-widthusage. Forthecaseofmirroredservers,

we showhow aclient canobtainalistof all

servers mirroring all or part of the desired

site. LDSapplied to partially mirroredsites

generatessubstantiallylessDNStraÆc than

LDS applied to caching. Finally, we discuss

howahostcanusethelocationdatainorder

to make intelligentdecisionsaboutwhere to

retrievedesiredobjects.

1 Introduction

Networkcachingofdocumentshasbecome

a standard way of reducing network traÆc

and latency in the Web. Caches are

cur-rently employed in institutional, local,

re-gional and national ISPs. Cache

hierar-chies,createdwhencachesinlower-levelISPs

pointto cachesin higher-levelISPs,are

cur-rentlyprevalentintheInternet[8,9]. Today's

cache hierarchies use static, manually

con- guredpointersto de ne the hierarchytree.

Cache hierarchies operate asfollows. When

abrowserrequestsadocument,itsendsa

re-questto aleafcache. Thiscachetheneither

servesthe document (if it is cached) or

for-wardsthedocumenttoitsparentinthe

hier-archy. Theprocessisrepeatedalongastatic

chainofcachesuntiltherootofthehierarchy

isreached. Ifthereisalsoacachemissatthe

root, the root forwards the request directly

to the originserver. A response is returned

alongthecachechaininthereversedirection.

Cooperatingcachesincachehierarchiesoften

useICP(InternetCacheProtocol)toimprove

andenlargethescopeofthesearch[15].

Cachinghierarchieshaveseveralproblems.

First,requestsforlesspopulardocumentswill

experience misses at all caches in the cache

chain. Fordeephierarchies,thesemisseslead

to poorlatency performance [13]; moreover,

ICP can further degrade performance, since

thecache mustwait forareplyfrom all

(2)

|theydonotpermitabrowseroracacheto

choosethesubsequentcacheaccordingto

cur-rent topology or traÆc conditions. Manual

optimizations, such as sending requests for

certaintop level domainsto designated

par-ent caches,are possible, butevenwiththese

optimizations,thehierarchyforagivenURL

remainsstatic. Third,caching hierarchiesdo

not allow the chain of caches to extend

be-yond theroot cache;if thereisamissat the

rootserver,therequestisforwardeddirectly

totheoriginserver,andnevertoacachethat

liessomewhereinbetweentherootserverand

theoriginserver. This isa problembecause

a cache nearby the origin might be able to

serve an object much faster than the origin

server,especiallywhentheoriginserverruns

on a slow machine or has a low bandwidth

connection.

Inthispaperweproposeanewcooperative

caching scheme that has the following

fea-tures. (1)Atmosttwoservers(includingthe

originserver)arevisitedintherequestchain;

(2) The chain of caches depends on the

re-questedURLandcanchangedynamicallyas

a function of current network topology and

traÆc conditions; (3) An arbitrary cache in

theInternetcanbequeried,includingacache

that isfarfrom thebrowserbut closeto the

origin server. (4) The scheme can be

in-crementallydeployed with minor changesto

DNSservers. Furthermore, althougha

thor-oughperformance study is still required, we

feel the scheme should leadto a substantial

reductionindelayandnetworktraÆcas

com-paredto traditionalhierarchicalcaching.

Our caching scheme makes use of a new

networkapplication, theLocationData

Sys-tem(LDS),whichwealsode neinthispaper.

TheLDS,de nedasanextensionofDNS,

al-lows an arbitrary host to obtain the IP

ad-dresses of the servers that store a speci ed

URL.TheLDSisofindependentinterest,and

canbeused forotherapplications, including

choosing the best mirrored site for a given

URL. For the case of network Web caching,

wespecifyhowacacheupdatestheLDS

dis-tributed database when it stores and evicts

objects, and how a cache pushes objects to

parent cachesin orderto improvedelay and

bandwidthusage.

caching signi cantlyincreasesthenumberof

DNS messages. In this paper we also show

how LDS can be applied to replicated and

partially replicatedservers. Inthiscase, the

amountofDNSmessagesdoesnotincrease.

Thispaperisorganizedasfollows. In

Sec-tion 2 we de ne the LDS. In particular, we

show how DNS can be extended to provide

the LDSservice. In Section 3weshowhow

network cachescanexploit the LDS; in

par-ticular,wediscusshowthecachesupdatethe

LDSdistributeddatabase,andhowcachesat

higherlevelsin thecaching hierarchycanbe

populated. In Section 4we show how

docu-mentandsitereplicationcanexploittheLDS.

InSection 5wediscusshowahostcanmake

routingdecisions based on the results of an

LDS query. In Section 6 we discuss related

research on cooperative caching. Section 7

presents directions for future work and

Sec-tion8concludesthepaper.

2 Location Data System (LDS)

Copies of an object, with each object

ref-erenced by the same URL, are often

avail-able from di erent servers in the

Inter-net. These servers include origin servers,

replicated servers (also called mirrored

sites) and cache servers(also called proxy

servers). The origin server for an object is

the serverat which the objectoriginates; it

alwayscontainsanup-to-datecopyofthe

ob-ject. A replicated server contains copies of

objects that have been placed into it

(typi-cally manually or by pulling); typically,

ob-jectsin replicated serversare up-to-date. A

replicated server may replicate entire Web

sites or may replicate only portions of

vari-ousWebsites. Cacheserversobtaincopiesof

objectsby pullingthem ondemand. In

par-ticular,whenacacheserverreceivesarequest

foranobject,andiftheobjectisnotcached,

thecacheserverretrievestheobjectfrom

an-otherserver(whichmaybetheobject'sorigin

server, areplicated server, oranother cache

server), storesacopy ofthe object,and

(3)

thispaperwerefertoallthreetypesofservers

asobject servers.

Itishighlydesirableforahostinthe

Inter-nettobeabletodeterminethelocations(i.e.,

theIPaddresses)ofalltheobjectserversthat

containcopiesofaspeci edURL.In

particu-lar,ahostwouldliketobeabletogivea

net-workapplicationaURLandreceivefromthe

applicationalistofalltheobjectserversthat

containtheURL.Inadditiontoreceivingthe

IPaddressesofalltheserversthatcontainthe

object, it is desirable to receive information

about thefreshness of each copy, e.g., when

each copy was last modi ed orthe \age" of

theobject(asde ned intheHTTP/1.1[2]).

In this section weoutline a new

network-ing application that provides the service of

mapping a URL to a list of object servers

that contain the URL, with each server on

thelist having associated freshness

informa-tion. We refer to this system as the

Loca-tion Data System (LDS). Wehavetaken

aratherpragmaticapproachindesigningthe

LDS. Our principlegoal has been to design

a systemthat can be rapidly deployed with

incremental changes to the existing Internet

infrastructure. A second goal is scalability,

i.e., asystem that is decentralized and

hier-archical. A third goal is performance, that

is,asystemthat quicklyreturnsthelocation

data while injectinga minimumof overhead

traÆcintothenetwork.

Figure 1 shows the basicmapping service

providedbyLDS.Inthe gure,theclienton

the left has a URL, it gives it to the LDS

systemwhich returnsalist of objectservers

thatcontaintheobject. Insteadofthewhole

URL,theclientcanaswellgiveonlyapre x

oftheURLtotheLDSblackboxwhichthen

returns a list of object servers that contain

URLs matching the pre x. In the simplest

case, the pre x is only the hostname of the

URL;inthiscase,theLDSserviceisidentical

to theDomainName System,operationalin

theInternetforover20years.

Given the similarities in the service

pro-vided by both DNS and LDS, we have

de-cided to base the design of LDS on DNS.

In fact, LDS can be implemented by

mak-ing minor extensions to BINDname servers

Black

LDS

Box

List of object

servers that

contain object

LDS

URL

Client

Figure1: LDSservice

and to the DNS protocol. Recall that DNS

provides a mapping of hostnames to IP

ad-dresses. (DNScan returnmorethan one IP

address for a hostname for mirrored sites.)

DNSusesahierarchyofnameserversto

im-plement a distributed database of resource

records, whereby a resource record contains

amapping.

We now present an overview of the LDS

design;weprovidemoredetailsin the

subse-quent subsections. As does DNS, LDS uses

a hierarchy of servers to implement a

dis-tributeddatabaseofresourcerecords. We

re-ferto theLDS serversaslocationservers.

EachLDSresourcerecordmapsaURLtoan

objectserver,withassociatedfreshness

infor-mation. EachURLhasone(ormore)

author-itativelocation servers(To conveythe main

idea,we initially assumethat each URLhas

exactly one authoritative server.) The

au-thoritative location server contains a list of

resourcerecords for the URL, that is, a list

ofobjectserversthatcontaintheURL(along

withthefreshness information). Other

loca-tionserversmaycontaincachedcopiesofthe

listof resourcerecords. In ourbasic design,

hostsquerylocationserversinamannerthat

isfully analogousto theDNS protocol. The

sequence of query and reply events is

illus-tratedin Figure2.

In Figure 2, C is the querying host (e.g.,

a browser), O

0

is the origin server for the

queriedobject,machinesO

1 {O

4

areother

ob-jectservers(e.g., caches), andmachinesL

1 {

L

4

arelocationservers.

Thequerying host sends a location query

to its local location server (L

1 ). If L

(4)

L

2

C

O

0

L

1

O

1

O

3

O

2

O

4

L

3

L

4

1

8

2

7

9

3

9

6

9

5

4

Figure 2: Sequence of Location Queries and

Replies

sends a query to a root location server

(L

2

). If L

2

does not have the location

in-formation cached, it returns the address of

alocation server responsible for the domain

of the origin server in the query (L

3 ). L

1

sends a new location query to L

3

and if L

3

doesnothavethelocationinformation,it

re-turns the address of the authoritative

loca-tionserver(L

4

). Finally, L

1

queriesthe

au-thoritativelocationserverandreceivesthe

lo-cationinformation. LocationserverL

1 then

sends the information to the querying host

andalsocachestheinformation. (In this

ex-ample we assumed that all queries between

LDSserversareiterative;recursiveand

com-binations of recursiveand iterative can also

be used. Like DNS,LDS isnotbased on

ei-therquerytypebeingused. Wealsoassumed

thatthereisoneintermediatelocationserver

betweentherootandauthoritativeservers;in

practicethere canmoreorless.)

Suppose that thereply indicated that

ob-ject servers O 0 , O 1 , and O 3 contain copies

oftheURL.Thequeryinghostthenchooses

toretrievetheURLfrom oneoftheseobject

servers. Ideally, the hostchooses theobject

serverthatwilldelivertheobjectthefastest.

(We will discuss how this choice might be

madeinSection5.)

Giventhesimilaritiesbetweenourbasic

de-sign of the LDS and the DNS, we now

ad-dress whether the LDS can implemented in

the existing DNS, or within a slightly

ex-leadto rapiddeploymentoftheLDS.

2.1 Resource Records, Query

Messages and Reply Messages

Forlocation requestsanew DNSresource

record type is needed. It is similar to the

standardA-typeresourcerecord,whichmaps

a hostname to an IP address. We now

de-scribethisnewDNSresourcerecordtype.

Each resource record of this type must

contain an object identi er. In order to

be compatible with standard DNS queries,

object identi ers must be encoded in a

standardized way. Using this encoding,

the query appears like a normal DNS

A-query and can be resolved without

changing any parsing routines in DNS

servers. Figure 3 shows how the URL

http://www.eurecom.fr/~bob/index.html

wouldbeencoded.

t

p

3

w

w

w

7

e

u

r

e

c

o

m

2

f

r

0

4

h

t

m l

5

i

n

d

e

x

4

~

b

o

b

4

h

t

Figure3: RepresentationofaURL

ThisencodingallowsforeÆcient

compres-sionofidenti erssharingacommonpart,just

aswhen queryingthe addressesof all of the

serversinagivendomain. Sincetheidenti er

canbeviewedasageneralizedhostname,all

thenormalfacilitiesofDNScanbeused.

We mentionthat there is a 255character

limitonhostnamesinDNSquerieswhich

im-plies that URLs longer than 255 characters

will have to be truncated. Also, the length

ofasinglelabel(e.g.,onecomponentof

host-nameorURL path)is limited to 63

charac-ters. We do not expect these limits to be

aproblem. Wealso mention that URLs are

case-sensitive but DNS does not guarantee

case-sensitivetreatment. Nonetheless,wedo

notexpectthistobeamajorissuesincemost

URLs cannot be confused with other URLs

evenifcaseisnotpreserved.

(5)

ceededthese limits. Evenifthese 122URLs

were truncated to conform to DNS

limita-tions, we did not observe any collisions

be-tween two di erent URLs. Neither did we

ndanyURLswherecase-insensitivitywould

havepresentedanyproblems. Therefore, we

donotproposeanymethodsforhandlingsuch

URLsbeyondsimpletruncation.

The reply will include several resource

records, one for each object server holding

a copy of the URL. Each of these resource

recordshas atime-to-live(TTL) eld which

speci es how long that resource record can

be cached at a DNS server. This

TTL-informationcaneitherbespeci edbythe

au-thoritativelocationserveror,forabetter

es-timate, by the object server. If the location

serversetsthisTTL- eld,itwillbethesame

for allresource recordsfrom that server. If,

ontheotherhand,this eld issetbythe

ob-jectserver,itallowstheobjectserverto

com-municateinformationonhowlongtheobject

islikelyto beavailableat thatserver.

The actual data section of the resource

record (RDATA) contains the IP-address of

theobjectserverandsomefreshness

informa-tionabouttheobject.Thisinformationcould

be for examplethe last modi cation date of

theobject,whichprovidesaneasywayof

dis-tinguishing between possible stale copies of

theobject. Inthiscase,theauthoritative

lo-cationserverwouldhavetoperiodicallyquery

theoriginservertogetanup-to-date

modi -cationdatefortheoriginalobject.

Anotherpossiblepieceofinformationthat

couldbeincluded in theresourcerecord,

in-stead of the last modi cation date, is the

\age"oftheobject,as de nedinHTTP/1.1.

Anobjectservercoulddeterminethislocally,

and a small age would refer to a relatively

freshcopy. Thismethodwouldnot,however,

o erthesameguaranteesonobjectfreshness

aswouldthelast modi cationdate.

Yetanotherpossibilityistoincludesimple

TTL-information, but this would be

redun-dant, sincethe resourcerecord by de nition

includesaTTL- eld. ATTL-estimateis

use-fulwhenthequeryinghostdecideswhich

ob-jectserverto use. Objectserverswithshort

cisionmakingprocess.

Because all location servers are allowed

to cache LDS replies, some location servers

may havestale location information in their

caches. Whenthestatusofanobjectchanges

atanobjectserver,theobjectservernoti es

the authoritative location server. This

up-dateis not,however,re ectedonthecached

copies of the location information. If stale,

cached locationinformationis deliveredto a

queryinghost,the queryinghostmaydecide

to forwardthe request to a Web cache that

hasevictedtheobject.

Onesolutionistousetheonly-if-cached

directive of HTTP/1.1 when requesting

ob-jectsfrom Web caches. If the distant cache

hasthe objectcached, itwill returnthe

ob-jectfrom the cache; otherwiseit will return

anerrortothequeryinghostindicatingthat

theobjectisnolongercached. Thequerying

host will then choose another object server

from the list. We are currently looking into

waysofreducingtheamountofstalelocation

informationinLDS.

2.2 UpdatingtheLocationServers

Theobjectserversneedto keepthe

infor-mation at the authoritative location server

up-to-date. Forthis we needanewprotocol

that is used to exchange information about

newcopies of objects. Forexample, when a

cache caches a new object, it must send a

messageto the authoritative location server

responsible for this object, indicating that

the object has been cached. This message

shouldalsoincludetheinformationwhichwill

be present in the resource record (e.g., last

modi cationdate)aswellasaTTL-estimate.

Similarly,whenthecacheremovesanobject,

itmustsendamessagetothelocationserver

indicatingthattheobjectisnolongercached.

Since the location data is managed over

DNS,thedynamicupdatefacilityofDNS[14]

can be used to dynamically update the

lo-cation information. With this method, it is

(6)

desired.

UsingthedynamicupdatefacilityofDNS,

anobjectserversendsanupdatemessage

ev-erytimethestatusoftheobjectchanges. For

a Web cache this change of status could be

thecaching ofanewobject,aremovalofan

objectorcachinganewversionoftheobject

aftertheobjecthasbeenmodi edatthe

ori-ginserver. Inorder toreduce theamountof

update traÆc, object servers can send their

reports in batches. This isespecially

bene -cialfor Web caches that can send the

infor-mationaboutanHTMLpageandallinlined

imagesin onemessage,therebyreducingthe

overheadconsiderably.

2.3 Implementation

Implementing the LDS is relatively

straight-forwardand can be done

incremen-tally. Since the system is an extension to

DNS,nonewserversneedtobecreatedand

introduced.

For a DNS server to be LDS-capable,

it needs to be able to interpret a new

querytypecodeand the associatedresource

records. DNSalreadyhandlesnumerous

dif-ferent query and resource record types, so

addingonemoreissimple. Furthermore,the

authoritative serverhas to beable to

main-tain the location information database and

constructresourcerecordsfromthisdatabase

toincludeinrepliestoqueries.Althoughthis

URLdatabasecontainsmoredatathana

nor-mal DNS database, it can be maintained in

asimilar way. For aWeb cache to be

LDS-capable,itsimplyhasto beableto sendthe

updatemessagesto theauthoritativeserver.

Thearchitectureallowsforincremental

de-ployment,sinceifacontentproviderdoesnot

operate a location server, no LDS resource

recordswillexist. IfnoLDSresourcerecords

are available, the querying host would

for-wardallrequestsusingastaticpolicy,for

ex-ample,forwardingthemdirectlytotheorigin

server or a parent in a cache hierarchy. If

LDS recordsexist,then LDS-enabledclients

normal, static methods for nding objects.

Intheearlyphasesofdeployment,the

tradi-tional hierarchyshould bekeptasafallback

forclientsandserversthat donotimplement

thenewprotocols.

3 Network Web Caching

In this section we propose a new

cooper-ativecachingschemethat exploits theLDS.

We assume that browsers forward their

re-quests through aproxy cache in addition to

thebrowser cache. Thismethod is currently

widely used by ISPs and other institutions,

suchascompaniesanduniversities,too era

betterservicetotheirclientsaswellasto

re-ducethe out-goingtraÆc. Also, ifa rewall

isused, thenall requestsmust go througha

proxy at the rewall in any case, so adding

a cache there does not present a signi cant

overhead.

We now describe our cooperative caching

scheme. A browser rst sends anHTTP

re-questforanobjecttoitsproxycache. Ifthe

proxy cache does not have the object, the

proxy cache invokes LDS to obtain alist of

allthe object servers(i.e., originserverand

someothercachesin the network) that

con-taintheobject. Theproxycachethenchooses

the\best"objectserverfromthelistand

for-wardstheHTTPrequesttothisobjectserver

(seeSection5). Theproxyreceivestheobject

fromthebestobjectserverandforwardsthe

objecttothebrowser.

Ourcachingschemehas several appealing

features. First,atmosttwoserversarevisited

to retrieve an object. Standard hierarchical

cachingschemes(suchasNLANR[8]and

Re-nater[9])cancauserequeststopassthrough

alargenumberofobjectserversbeforeacopy

oftheobjectisfound,which canseverely

in-creaseobjectretrievaltime[13]. Second,the

schemeallowstheproxyservertochoosefrom

alltheobjectserversthatcontaintheobject.

Letuslookatacoupleofscenarios:

(7)

andalltheother objectserversonmore

distant ISPs. In this case, the proxy

server would choose the cache in the

neighboringISP.

 LDSreturnsalistofobjectservers,with

onecacheserveronthesamecontinentas

theproxyserver,andalltheotherobject

serversontheother sideoftransoceanic

links. In this case, the proxy server

choosesthecacheinitscontinent.

 LDSreturnsalistofobjectservers,with

alltheobjectserversfarfrom theproxy

and near or in the origin server's ISP.

In this case, the proxy may still prefer

tochooseoneofthecachesoverthe

ori-ginserver: Theoriginservermayrunon

a slow machine orhave a slow network

connection.

Itisimportanttonotethattraditional

hierar-chicalcachingdoesnotpermitthelastoption.

Withhierarchicalcaching,ifthereisamissat

therootcache,therootcacheforwardsthe

re-questdirectlytotheoriginserver. Thus our

scheme enables the proxy to select from all

caches that are on the \route" between the

proxyandtheoriginserver.

Ourcachingscheme doeshaveanunusual

peculiarity. Intheschemejustdescribed,the

proxy server will always retrieve an object

either from the origin server or from some

other proxy server. Because all the proxy

servers are near the bottom of the Internet

hierarchy(in institutionalISPsorlocal

resi-dentialISPs),withtheexceptionoftheorigin

server,all copiesof an object arestored,

es-sentially,intheleavesoftheInternet. There

are, however, several compelling reasons to

also store copies of objects higher upin the

Internet hierarchy, in particular in regional

and national ISPcaches. First,the

request-ing proxy may be ableto retrieve anobject

morequicklyifitisavailablefromacommon

parent(orgrandparent)ofsomeother proxy

cache that has the object. Second,

hierar-chicalcachingprovidesanasynchronous

mul-ticast infrastructure, which can signi cantly

reducebandwidthusageintheInternet[10].

Giventhatcacheserversarepresentat

re-gional and national caches into hierarchies,

wenowmodifyourcachingschemetoexploit

thehigher-levelcaches.

3.1 Populating Caches

Inordertoexploit thehigher-levelcaches,

we must make sure that the higher level

caches obtain copies of the objects cached

at lowerlevels. We propose that lower level

cachespushobjectstotheirhigherlevel

par-ents. This way the objectsare easily

avail-ableforsiblings under the sameparent,and

requestsfromdistanthostscanbesatis edat

ahigherlevelinthenetwork.

The details of our pushing strategy are

as follows. A low level cache periodically

sends a list of new objects (with last

modi- cationdates) to itsparent. Theparent

de-cideswhichoftheobjectsinthelisttocache

andretrievesthesefromthechildcache. This

is done at every level in the hierarchy and,

in the end, the caches are lled up as they

would have been using traditional

hierarchi-calcaching. Cachesatthelowestlevelshould

send informationaboutevery object, but at

higherlevelsacache mightuse somelocally

de nedpolicy(e.g.,objectsthatarecachedin

multiplechildcaches)todecidewhichobjects

toreportto itsparent.

3.2 Network TraÆc

Since LDS increases the number of

mes-sagessentoverthe network,itmayoverload

somelinksorserversandinfactdegrade

per-formance. We are currently performing an

analysis of how much LDS increases current

networktraÆc. Thefollowingarethekey

fac-tors:

1. LDS queries and replies: Although

thesearenormalDNSmessages,anLDS

query is sent every time a proxy needs

toretrieveaURLthatitisdoesn'thave

cached. Caching LDS recordsat the

(8)

ordinaryDNS,since aURLreference is

morespeci cthanahostnamereference.

2. Update traÆc: Whenthestatusofan

objectchangesinacache,thecachemust

send an update messageto theobject's

authoritativelocationserver. For

popu-lar, widelycached objects, this may

re-sult in too much traÆc directed at the

authoritativeserver.

In order to reduce the amount of LDS

lookup traÆc in the Internet, we propose a

variation to our basic LDS scheme. In this

variation, once we getthe location

informa-tion for an HTML-page we assume that all

theinlinedobjectsonthepagearealso

avail-ablefromthesamesetofobjectservers. One

advantage of this scheme is that it heavily

cutsdownontheamountofLDStraÆc

com-paredtothatbasicschemesincewenowonly

send one LDS query for each HTML-page

instead of each URL. The disadvantage of

thevariationis that there arenoguarantees

thattheinlinedobjectsareavailablefromthe

sameobjectservers. Originserversand

repli-cated servers should have the objectsbut a

cachemayhavealreadypurgedsomeorallof

them. Ifthe objectserverdoesnothavethe

requested object, we will do an LDS query

forthat objecttoobtainup-to-date location

information.

InSection4wewillshowhowtheamount

of LDS traÆc can be signi cantly reduced

when the LDS system is restricted to

repli-catedservers.

To address the issue of update traÆc, we

proposethat anycachethat hasaparent

re-frain from sending update information.

In-stead, the update information is forwarded

totheparentduringthepushingphase. The

highestparentthat doesnot havea copy of

the object sends to the authoritative server

an update messagethat containsupdate

in-formationforitself anditsupdatedchildren.

Thebene tisthattherearesigni cantlyless

higherlevelcachesthanlowlevelcaches,thus

much lessupdate traÆc.

Intheabovescheme,thehighestparentin

the hierarchy sends update messages for all

itschildrenwhicharethusallincludedinthe

databaseatthe locationserver. As aresult,

institutional caches would also be included

in the list returned by the location server,

whichmeansthattheycouldreceiverequests

for cached objects from hosts outside their

own network. It is desirableto keepthe

in-stitutional caches o the list oflocations for

tworeasons. First,institutions likely donot

wantoutsidehosts using theirbandwidthto

retrieve objects. Second, objectsat

institu-tional caches are likely to be cached in the

parentcachesand outsidehosts canretrieve

objectsfasterfromtheparentcache.

Cache digests [11] are used in traditional

hierarchies to represent cache contents in a

compressedform. Acachefetchesthedigests

ofitsneighborcachesandwhenarequest

can-notbesatis edlocally, the cache checks the

digestsoftheneighborcachestoseewhether

anyofthemhavetheobjectcached. Ifso,the

objectisretrievedfromthatcache;otherwise

therequestisforwardedtotheparentcache.

In LDS-based caching, having the digest of

the parent cache is useful because this way

weavoidtheLDS-lookupforobjectsthatare

cached at theparentand would in any case

beretrievedfrom there.

4 Replicated Servers

TheLDSschemeisnotlimitedtocaching.

Itcanalsobeusedtodisseminateinformation

aboutreplicated ormirroredobjects. When

an object is replicated at another server,

this information is added into the location

databaseattheauthoritativelocationserver.

Whenaclientrequeststhisreplicatedobject,

it doesan LDS lookup and gets a list of all

serversholdingacopyoftheobject. Because

theinformationisnotdynamicallyreplicated

asincaching,butratherplacedatwell-chosen

locations,thereisrarelyneedtonotifythe

(9)

line.

With LDS, a user addresses a replicated

objectbythe object'suniqueURL,LDS

re-turns the list of servers that have the

ob-ject. Thebrowserthentransparentlychooses

one of these (the best in some sense). The

user would not have to know about the

ac-tualphysicallocationoftheobject.

4.1 Reducing LDS Query TraÆc

The scheme just described for replicated

serversstillrequiresthatclientssendanLDS

lookupforeachURL.Inorder toreduce the

numberofLDSmessagesweproposethe

fol-lowingmethodofreplicatingserversand

stor-ing information about the replicated URLs.

In this scheme, the LDS no longer keeps

track ofcachedobjects. It onlytracks

repli-catedandpartiallyreplicatedservers. We

re-quire that replicated serversreplicate either

whole sites orcomplete sub-treesof servers.

An example sub-tree of an origin server is

all the objects on the origin server below

a given URL pre x, e.g., all objects under

http://cnn.com/sports/.

Suppose that a client wants to retrieve a

URL. Normally, a client would send a DNS

lookuptoobtaintheIPaddressoftheorigin

serverandretrievetheobjectfromthere.

Us-ingLDStheclientproceedsasfollows. First,

the client sends an LDS queryfor the

host-nameintheURL.TheLDSsystemreturnsa

listofallserversthatmirroreitherthewhole

siteoracompletesub-treefromthesite. We

need to extend the resource record de ned

in Section 2.1 to include a pre x indicating

the sub-tree that is mirrored by that

par-ticular server. The client then chooses the

\best" serveramong those serversthat

mir-rorthedesiredsub-treeandforwardsthe

ac-tual HTTP-request to this server. A server

may mirror multiple sub-trees from a single

originserver; in this case LDS would return

oneresourcerecordforeachsub-tree.

Comparedtothecachingschemediscussed

in Sections 2 and 3, this scheme has

sev-onlyone LDS lookupfor each server;in the

cachingscheme,theclientmustsendanLDS

lookup for each URL. The caching scheme

drastically increases DNS traÆc in the

net-work whilethemirroring schemekeepsDNS

traÆc at its current level. (Recall that a

client would in any case have to do a DNS

lookup to get the IP address of the

ori-gin server.) Second, objects at mirrored

serverstendtostayatthemirroredserversfor

longperiods of time while objects in caches

are cached and purged dynamically. Hence,

LDS information for a mirrored server can

be cached longer at location servers which

greatly reduces lookup traÆc. When the

authoritative LDS server sends information

about a mirrored site, it should rst send

outinformationaboutmirrorsthatmirrorthe

mostofthesite. ThisisbecauseDNSreplies

are by default sent over UDP and the

mes-sagesize is severely limited (512 bytes). By

sendinginformation aboutmirrorsthat

mir-rorlargesubtrees,wemaximizethelikelihood

thattheclienthasthenecessaryinformation

inthereply.

ComparedtothestandardDNS-wayof

ac-cessing objects, our mirroring scheme does

notincreasethenumberof DNSmessagesin

thenetwork. Thereplymessagesareslightly

larger,however,sinceweneedtoindicatethe

pre xin theresourcerecord. We donot

ex-pectthisincreaseinsizetobesigni cantsince

URLpre xescanbeeÆcientlyencodedusing

theencodingspeci edinSection2.1andDNS

resourcerecordpointers.

5 Routing Decision

Whenthecachehasreceivedareplyto its

LDS-lookupcontainingseveralobjectservers

(cachesandoriginservers),thenextquestion

is: \Whichoneofthepossibilitiestochoose?"

Determiningthebestalternativeisan

impor-tanttopicofourongoingresearch.

Possiblesolutionsinclude:

(10)

 PassQoSinformationalong in LDS

up-datesandreplies.

 Use explicit routing information from

BGP.

We will now provide some details on how

thesemethodscouldbeused.

Inthe rstoption, allqueryinghosts(e.g.

low level Web caches) measure the time it

takestofetchobjectsfromotherservers(Web

serversorothercaches). Thisdownloadtime

gives a crude estimate of the actual

band-width between the two hosts. The

query-inghostkeepsalistof serverswithwhich it

has had good connections and prefers those

serversto otherswhen bothtypes ofservers

arepresentin theLDSreply. Ofcourse, the

actualnetworkconditionschangeallthetime,

but the results presented in [7] on

perfor-mance characteristics of mirrorservers

indi-cate,that from alarge setof mirrorservers,

onlyasmallnumberneedtobeconsideredas

candidatesfordownload. Althoughthestudy

in[7]usesa xednumberofmirrorsites,we

believethattheresultscanbeappliedto

sit-uationswhere thenumberofserverschanges

dynamically.

Thesecond optionistopasssomeQoS

in-formationin the LDS repliesalong with the

age information about the object. For

ex-ample, a Web server can measure the

con-nection times of incoming connections,

esti-mate theconnectionbandwidthand takean

averageoftheestimatedbandwidthsoverall

connections. Thisaveragebandwidthcanbe

seenas thebandwidth that a randomclient

somewhere in the network could expect to

getwhenrequestingobjectsfromthisserver.

Likewise,cachescanmeasurethebandwidths

ofallout-goingconnectionsandcalculatethe

averagebandwidth.

The third option uses explicit routing

in-formation obtained overBGP by talking to

localrouters. Thisinformationgivesthehost

atopological map of the network which can

beusedto ndouthowfareachoftheservers

intheLDSreplyare. Unfortunately,routing

informationgivesonlyreachability,not

qual-al.[4]havestudiedusingroutinginformation

in request routingand have decided against

usingBGPinformationbecauseof

con gura-tionandsecurityissues,amountofdataand

diÆculty of obtaining it. Their approach is

basedonusingwhois-services.

Regardless of the approach chosen, any

queryinghostcan implement any local

poli-cies necessary when deciding where to

for-ward arequest. For example, acache

oper-ator may want to forward requests to other

caches based on the domain of the origin

server.

6 Related Research

In[3]Gadde etal. comparetraditional

hi-erarchicalcachingwithanarchitecturewhere

asingle centralized serverholds information

aboutwhereeachdocumentiscached. When

a low level cache wants to nd out where

an object is cached, it sends a message to

this central mapping server which responds

by either redirecting the request to a peer

serverorbyforwardingittotheoriginserver.

This scheme resultsin allof the objects

be-ing cached at institutional caches. The

au-thors alsoperformsimulationsusingasmall

numberofcachesand nd outthat the

per-formanceofthecentralizedsolutionison

av-erage better than that of the traditional

hi-erarchy. The numberof caches in the

simu-lations is low, only 8 and 32 cache

con gu-rations are simulated. The authors express

theirdoubtsaboutthescalabilityoftheir

so-lution,butpresentsomeideasforreplicating

anddistributingthelocationinformation.

Inour approach, the queryinghostgets a

listofallservers holdingacopyoftheobject

insteadofbeingredirectedtooneofthembya

mappingserver. Thisputs thequeryinghost

incontrol overwheretherequestwillbe

for-warded. This decisionmight greatlydepend

onlocal conditionsand policies unknown to

a central server. Another di erence in our

approach is that the location information is

(11)

where data is cached near the browser and

location hints are passed through a

meta-data hierarchy. Their architecture arranges

thecachesinvirtualhierarchies,oneforeach

object, for distributing the metadata

infor-mation. In their architecture, when acache

cachesacopyofanobject,itsendsouta

loca-tionhintindicatingtheURLanditsaddress.

This hint ispropagated in the virtual

meta-datahierarchyandcouldeventuallyreachall

caches in the system. Caches keep track of

allthehintstheyhavereceivedandusethem

toforwardrequests. If acachereceivesa

re-questforanobjectwhichisnotcachedlocally

but thehintsindicate another cache holding

the object, the request is forwarded to this

cache. Thevirtual metadatahierarchies are

constructedusing IP-addressesandURLs in

away which tries to minimize the distances

betweenparentsandchildren. Also,the

hier-archyguaranteesthatacachecanreceiveonly

onehintforanyobject. Thishintislikelyto

befromthenearestcacheholdingtheobject,

butthisisnotguaranteed.

One di erence between their and our

ap-proaches is the way caches are placed. In

their architecture only institutional caches

holdobjectsandeveryrequestforwarded

us-ingahintwouldbeforwardedtoanother

in-stitutionalcache. Thisbehaviormightbe

un-acceptable to some people who do notwant

more external traÆc on their networks. In

our approach, objects are pushed to higher

levels andthis eliminates theneed for going

to other institutionalcaches. Second

impor-tantdi erenceisthatusingLDSforlocating

objects,aqueryinghostgetsalistofall

pos-sible locations. It can then choose from the

list accordingto alocal policy orby

adapt-ing to current network conditions. In [13] a

cache receivesonly one location hint. Since

the virtual metadatatrees are based on

IP-addressesandURLs,therearenoguarantees

thatthishintindicatesagoodserver. Weare

currentlyperformingacomparisonofthetwo

schemes.

Amiret al.[1]study three di erent

meth-odsforredirectingrequeststomirroredsites{

HTTPredirection,DNSroundtriptime

mea-surements,and sharedIPaddresses. Intheir

DNS-basedsolutiontheyuseDNSroundtrip

replicated server for a client. They place

the replicated servers near an authoritative

DNS server which gives the address of the

replicatedserverwhenqueried fortheorigin

server.Thisresultsinclientsbeingeventually

redirected to their closest replicated server.

Therearetwomajordi erencesbetweentheir

approachand ourreplicated serverapproach

(Section4.1). First,intheirsystemreplicated

serversmust always replicate the entire site

while in our system a server may replicate

onlysub-trees oftheoriginalsite. Second,in

their system the replicated servers must be

placedincloseproximityoftheauthoritative

DNSserver. Ourarchitectureplacesno

con-straintsontheplacementoftheservers.

7 Future Work

WewillcontinueourworkonLDSby

per-forming quantitativecomparisons of the

dif-ferent architectures presented in this paper.

In particular we will concentrate on

evalu-ating the amount of new traÆc in the

net-work causedby thelookupand update

mes-sages. We will also compare the traditional

cachinghierarchywith ourdi erentLDS

ar-chitectures to nd out how much object

ac-cesslatencycanbereducedthroughLDS.We

willalsocloselystudytherequestrouting

is-sueinorderto ndouthowmuchinformation

isneededto makegoodroutingdecisions.

8 Conclusion

Inthispaperwehavepresentedthe

Loca-tionDataSystem,anewnetworkapplication

that can be used to locate where copies of

objectsarestoredontheWeb. LDSprovides

a service similar to DNS and can be

imple-mentedasanextensiontoDNSanddeployed

incrementally. Wehavepresentedtwo

appli-cationsof LDS, locating objectsin mirrored

servers and locating objects in Web caches.

We have also discussed how clients can

(12)

fromthenetwork.

Acknowledgments

Wewould liketothankNeilze Dortafrom

INRIA forthediscussionswehadon

access-ingmirroredsites and herinput onthe

sub-ject. Thelog lesusedinSection2.1were

pro-videdbyNationalScienceFoundation(grants

NCR-9616602 and NCR-9521745), and the

NationalLaboratoryforAppliedNetwork

Re-search.

References

[1] Y. Amir, A. Peterson, and D. Shaw,

\Seamlessly Selecting the Best Copy

from Internet-Wide Replicated Web

Servers", in Proc. 12th International

Symposium on Distributed Computing

(DISC'98), Andros, Greece, September

1998.

[2] R. Fielding, J. Gettys, J. Mogul, H.

Frystyk,andT.Berners-Lee,\Hypertext

Transfer Protocol { HTTP/1.1", RFC

2068,January1997.

[3] S.Gadde,J.Chase,andM.Rabinovich,

\DirectoryStructuresforScalable

Inter-netCaches",TechnicalReport

CS-1997-18, Department of Computer Science,

DukeUniversity,1997.

[4] C. Grimm, J.-S. Vockler, H. Pralle,

\Request Routing in Cache Meshes",

in Proc. 3rd Web Caching Workshop,

Manchester,UK, June1998.

[5] P.Mockapetris,\DomainNames{

Con-ceptsandFacilities",RFC1034,

Novem-ber1987.

[6] P. Mockapetris, \Domain Names {

Im-plementation and Speci cation", RFC

1035,November1987.

[7] A. Myers, P. Dinda, and H. Zhang,

port CMU-CS-98-157, School of

Com-puter Science, CarnegieMellon

Univer-sity,1997.

[8] A Distributed Testbed for

Na-tional Information Provisioning,

<URL:http://ircache.nlanr.net>.

[9] Reseau National de telecommuncations

pour la Technologie,

l'Enseignement et la recherche,

<URL:http://www.renater.fr>.

[10] P. Rodriguez, K. W. Ross,E. W.

Bier-sack,\ImprovingtheWWW:Cachingor

Multicast?", in Proc. 3rd Web Caching

Workshop,Manchester,UK,June1998.

[11] A. Rousskov, D. Wessels, \Cache

Di-gests",inProc.3rdWebCaching

Work-shop, Manchester, UK,June1998.

[12] Squid Internet Object Cache,

<URL:http://squid.nlanr.net>.

[13] R. Tewari, M. Dahlin, H. M. Vin,

and J. S. Kay, \Beyond Hierarchies:

Design Considerations for Distributed

CachingontheInternet",Technical

Re-portTR98-04,DepartmentofComputer

Sciences, UniversityofTexasatAustin.

[14] P. Vixie, S. Thomson, Y. Rekhter,and

J.Bound,\DynamicUpdatesinthe

Do-main Name System (DNS UPDATE)",

RFC2136, April1997.

[15] D. Wessels, K. Cla y, \Internet Cache

Protocol (ICP), version 2", RFC 2186,

References

Related documents

which a fatiguing task of static maximal voluntary muscle activation of the knee flexors of the preferred leg was performed prior to (pre) and at lh, 24h, 48h, 72h and 168h

Having considered the cases of pancreas transplant loss due to immunological factors in the early postoperative period (irreversible antibody-mediated rejection in Pa- tient 29) and

Not all journals and ledgers will be displayed on every page. On this page you can access the general journal and all ledgers necessary to answer this question. There are several

pose a compensation mechanism to preserve the average in the consensus dynamics, where convergence speed can be improved by using a suitable gain, for which we pro- vide bounds

Renal biopsy in type 2 diabetic patients should be more extensively considered to accurately diagnose NDRD, guide further management, and predict renal outcomes, especially in

Taken with investments made since FY 2009, USDA Rural Development has now invested more than $224 billion in more than 1.2 million projects in rural communities across the

The Toronto Emergency Management Planning Group (TEMPC) provides the City with an effective vehicle for developing and maintaining a comprehensive Emergency Plan as well as

[r]